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On the Hidden Markov Model and Dynamic 
Time Warping for Speech Recognition—A 
Unified View 


By B.-H. JUANG* 
(Manuscript received January 17, 1984) 


This paper gives a unified theoretical view of the Dynamic Time Warping 
(DTW) and the Hidden Markov Model (HMM) techniques for speech recog- 
nition problems. The application of hidden Markov models in speech recog- 
nition is discussed. We show that the conventional dynamic time-warping 
algorithm with Linear Predictive (LP) signal modeling and distortion mea- 
surements can be formulated in a strictly statistical framework. It is further 
shown that the DTW/LP method is implicitly associated with a specific class 
of Markov models and is equivalent to the probability maximization proce- 
dures for Gaussian autoregressive multivariate probabilistic functions of the 
underlying Markov model. This unified view offers insights into the effective- 
ness of the probabilistic models in speech recognition applications. 


I. INTRODUCTION 


Research in speech recognition has produced numerous algorithms 
and commercially available speech recognizers that all work to some 
extent.’ Among these, temporal alignment techniques such as Dynamic 
Time-Warping (DTW) algorithms? * and Markov modeling®”’ are two 
prevailing approaches that are practical and theoretically sound. Both 
techniques emphatically address nonstationarity in speech signals. 
The two techniques, however, operate in different manners, as we will 
discuss briefly. 


* AT&T Bell Laboratories. 


Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- 
out payment of royalty provided that each reproduction is done without alteration and 
that the Journal reference and copyright notice are included on the first page. The title 
and abstract, but no other portions, of this paper may be copied or distributed royalty 
free by computer-based and other information-service systems without further permis- 
sion. Permission to reproduce or republish any other portion of this paper must be 
obtained from the Editor. 


1213 


In a recognition system employing dynamic time warping, a warping 
procedure based upon a prechosen, well-defined distortion measure 
aligns the unknown test speech sequence in turn to each reference 
sequence. The distortion measure must be a meaningful metric of 
dissimilarity between sound representations, usually the short-time 
spectra. The objective is to find a reference sequence of a known 
category or word that has the least dissimilarity to the test sequence 
after being optimally time aligned. Time alignment involves (time-) 
warping functions that are dynamic but deterministic representations 
of the possible variation of sound durations that are evident between 
the reference and the test sequences. There is, hence, one warping 
function that best matches the reference and test sequences, resulting 
in the smallest dissimilarity. The smallest dissimilarity measurement 
among all categories determines the recognition (recognition by min- 
imum distortion). 

Markov modeling techniques, while they may still perform sound 
pattern comparisons, do not require explicit time alignment. Instead, 
a probabilistic transition and observation structure is defined for each 
reference category or word. Such a structure, called a Markov model, 
includes (1) a state transition probability matrix, (2) an initial prob- 
ability vector, and (3) an observation probability matrix for discrete 
probability densities or a set of continuous densities defined by param- 
eter sets, or a mixture of the two when different types of densities are 
used. During recognition, one computes for each given reference model 
the probability of observing the test sequence. The model that produces 
the maximum observation probability is the classification result (rec- 
ognition by maximum probability). 

These two techniques are similar in theory, despite their vastly 
different operations and results. Confusion from this similarity often 
renders comparative studies of the two techniques difficult and futile. 
The purpose of this paper is, then, to give a unified tutorial view of 
the two techniques and to establish a theoretical link between them 
such that more fruitful and meaningful comparison can be made and 
each technique will improve the other technique. 

The paper is organized as follows. We first present statistical char- 
acteristics of Gaussian autoregressive sources in Section II, which 
serves as a foundation for the later developments. This topic is well 
studied in multivariate analysis, and an excellent treatment of it in 
the context of speech processing can be found in Ref. 8. In Section III 
we discuss maximum likelihood estimation of Gaussian autoregressive 
source parameters, and we explicitly demonstrate the relationship 
between some well-known probability density functions and distortion 
measures related to linear prediction (LP). In Section IV we discuss 
some fundamentals of probabilistic functions of Markov chains and 
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their applications in speech recognition. We again show that distortion 
measures can be cast in the framework of probabilistic functions of 
Markov chains. We finally discuss dynamic time warping in Section 
V and show that dynamic time warping employing LP-related meas- 
ures is equivalent to the recognition by maximum probability proce- 
dure, with some specific constraints. Theoretical similarities and dif- 
ferences between the two techniques are then discussed in detail to 
complete the attempted unified view. We start with Gaussian autore- 
gressive source because it is one of the best known sources and is 
useful in speech research. More general measure-theoretic steps could 
have been taken to establish a formal theoretical link between the two 
methods. It is, nevertheless, our opinion that the present framework 
of Gaussian autoregressive sources adequately gives a meaningful 
unified view of the two methods. 


Il. GAUSSIAN AUTOREGRESSIVE SOURCE 


Consider a stationary, zero-mean, Gaussian signal source. The out- 
put of the source, subject to observation, is an N-sampled sequence 
{S1, So, ... Sy}, where each s; is a real random variable. The vector 
notation s‘ = [s,; S. ... Sv] © R% denotes the observation. The 
probability density function of the random vector s for known auto- 
correlation matrix Cy is thus 


P,{s; < 8; S 8; + Asy, so S So S So 


2 + Aso,..., Sn S Sy S Sy + Asy| Cy} 
Ss C => lim ———S —————  — 
f(s| Cw) ree As,;As2 ... Asn 
i=1,2,..., N 
1 
= (24)-*?| Cy|exp |- 1 s'0¥s, (1) 


where s‘ = [s; so... Sy] is a realization of s‘, Cy = [rij]Mja1, and rj = 
E{s,s;} = r,:-;, due to stationarity. 

The source is assumed to be Mth-order autoregressive with coeffi- 
cients a‘ = [do a, ... ay], where ap is always unity. Hence, as shown in 
Fig. 1, the source can be equivalently viewed as a white Gaussian noise 
source with unity variance, followed by an all-pole filter 1/A(z), where 


A(z) =1 + az) + az? +... ay2™, 







WHITE GAUSSIAN 
SOURCE, 
VARIANCE = 1 






Fig. 1—Gaussian autoregressive source. 
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and an amplifier with multiplication factor o. If we denote the output 


of such a white Gaussian noise source as e;,1 =..., —1,0,1,..., then 
the output of the filter and the overall signal source is 
M 
x,;=—- ) axij;t+e forany i (2a) 
j=l 
and 
S; = oXi, (2b) 
respectively. 


For our interests in source identification, we wish to express the 
density function of (1) in terms of the source parameters o” and a, 
that is, f(s|a, o”), or equally importantly, we would like to obtain the 
gain-independent probability density f(x | a). The difficulty here is that 
the relationship of (2) is not defined for the first M samples since we 
have only finite sample observation starting from s,. However, using 
a classical (Gram-Schmidt) orthogonalization procedure for the first 
M samples, we may rewrite (2) as 


hus; = og 


heiS81 + hooSe = 0€2 


hus + hyoSet+... humSm = oem 
QmMS8; + Ay-182 +... QiSy + Sy4i = eye 
QmMSn-M + Qy-1SN-mM+1 +... Q1SnN-1 + Sw = Gen. 
Denoting e* = [e,, €2..., €m, msi, ---, En], we have a system equation 
for the N observation samples: 
ce = Hs (3a) 
e = Hx, (3b) 
where 
hu 
| 
he hee 0 
| 
hy hme hum 
H = |----------------4 |_-_------ 
Qu am-i ay 1] 
l 
| 
l 
0 amu Qy-1!. Qy 1 
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H, : 0 \M 
etal Reel Seo eee ee 
i] 
= H, ' Hy IN-—M 
1 
MN—-M 


The elements of e are uncorrelated, and hj; are properly scaled so that 
E[ee;] = Elee;] = Elec] = 6; for any appropriate 1 and j, thereby 
giving 


Ivy = Efee’} 
= (o”) 'HE{ss‘}H° 
= (0?) “HCyH’, (4) 
where Iy is the N X N identity matrix. Equation (4) leads to 
Cy = (o’)" HH (5) 
and 
[Cy] = |Cx|* = (07)% | HI. (6) 


But, since |H3| = 1 and |H| = | HH |-|Hs|, 
|Cy| = (c?)%| Hi 7. 


Note that matrix H, also corresponds to the diagonalization of the 
M X M autocorrelation matrix Cy = [r;] for 1,7 = 1, 2,..., M;ie., 


Iu = (0*)* H,CyHi, 
where Iy is the M x M identity matrix. Therefore, 
| Car] = (07)” | Hil? 
and 
| Cv] = (0?)"™ | Carl. (7) 


Given an autocorrelation matrix Cy, |Cy| can be easily obtained 
by first diagonalizing Cy using Cholesky decomposition or, more 
efficiently, Levinson’s recursion algorithm,? 


Bo 0 
By 


B'CyB = 6 = a 


0 Bu-1 


where B is an upper triangular matrix, the diagonal elements of which 
are all unity. Therefore, 
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M-1 
|Cu| = I Bi, 


and as a result, 
(8) 


M-1 
| Cy] = (0?)"™ ( IT a) 


Note that 8; is equivalent to the minimum mean-square error resulting 
from an ith-order linear prediction of the signal. The probability 


density function of s becomes 
M-1 1/2 
f(s| Cy) = (22)-%?(0?)-O? (a a) exp{—s'H'Hs/207}. (9) 
i=0 
For the gain-independent expression 
M-1 B; 1/2 1 
f(x| Cn) = (24)? ( II 4.) exp |- i nn) (10) 
i=0 


We further write x‘H‘Hx explicitly as 

M N M-1 N-1 

x‘H'Hx = (5 oi] (3 “) +2 ( oy an) (> si} 25.5 
i=l j=0 


jJ=0 


i=1 


N-M 
+ 2(aoam) (5 y sana) Q, 


where Q represents negligible terms compared to others for N >> M 





Letting 
M-i 
rq(t) - > Qj Gj+i, (11) 
j=1 
. Ne 1. r,(t 
ri) & DY xjxjoi = = Be SjSj4i = an (12) 
j=l j=l oO 
(13) 


and 
M 
a(x; a) = r(O)r-(0) + 2 Y ra(i)r.(i), 


we then have an approximation for the density function 
M-1 1/2 1 
f(s| Cn) = (24)? (a?) "A? (i a) exp {- 5 los; a} (14) 
i=0 


or 
(15) 


-1 9\-W2 
Tl &) exp \- = a al} 


f(x| Cw) = (20)-%? ( 2 
i=0 Oo 
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This function can be evaluated easily if the source parameters are 
known. 


Hl. MAXIMUM LIKELIHOOD 


In many realistic situations, such as dealing with speech signals, the 
a priori information about the source is usually not available. What is 
involved in parameterization of speech signals for coding and recog- 
nition is mainly estimating and identifying the source parameters from 
finite observations. More specifically, the following two dominant 
problems arise almost ubiquitously in speech analysis research: 
(1) Estimation—Given an observation s, what is the best or the most 
probable set of source parameters that led to the observation? 
(2) Identification—Given two observations s; and s2, how close are the 
two observations? Are they close enough to be considered identical? 
Or, what is the probability that s. has been produced by the same 
source as s,? The first problem is certainly very much studied in 
statistical estimation theory as well as in (deterministic) least-squares 
time-series analysis. Research in identification, particularly in the 
field of speech processing, resulted in some distance or distortion 
measures.!° Thus our main goal here is to integrate the formulation of 
the two problems in a probabilistic framework to better understand 
the probabilistic modeling techniques. In the following presentation, 
we shall focus on the maximum likelihood estimate of Gaussian 
autoregressive source parameters, as initiated in the previous section. 


3.1 Estimation—Autocorrelation method 


We discuss here only the autocorrelation method. For other vari- 
eties, we suggest that readers consult Refs. 9 and 11. 

The observation sequence s = {s), S2,..., $n} is assumed to be very 
long, i.e., N >> M. It is further argued that observation of the source 
output, which is infinitely long, is made through some “smooth win- 
dow,” so that the edge problem at the beginning of the observed 
sequence is avoided, maintaining that 

M 
e;= > ajx:;-; and s;= ox; forall i, (16) 
j=0 
with ap = 1 and x; = 0 for i < 0 and for i > N. Hence, the diagonal 
elements in H matrix are assumed to be all unity. Equation (6) then 
becomes 


| Cw] = (07)%, 


and the probability density function, as in eq. (14), is now expressed 
as 
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i 
f(s| Cn) = (22) N/?(0?) exp |- . a(a's; a} (17) 
Since (17) is a function of a and o? and it closely approximates 
f(s| Cy), we shall, in the following, define f(s| a, o”) as (17). Further- 
more, the gain-independent density function is thus 


f(x|a) = (27)-*”exp \- ; a(x; a} (18) 


It defines the probability density function of observing a vector x at 
the output of an all-pole filter 1/A(z) driven by a unity variance 
Gaussian i.i.d. sequence. 

It is clear that, given an observation s, the maximum likelihood 
estimate of a is the one that maximizes f(s | a, o”) or, equivalently, 
minimizes a(s; a) because o here is only a scaling factor. In linear 
prediction terminology, the optimal a® is obtained by minimizing the 
prediction-error energy a(s a), defined by (13), and a(s®: a) = 
min a(s; a) is called the minimum residual energy.’ Furthermore, 
to maximize f(s | a, 7) with respect to o”, the optimal estimate 
0%), is easily found to be 


o%) = a(s; a)/N, (19) 
which leads to 

alas; a) = N. : (20) 
This can be verified intuitively by recognizing that there are N valid 
uncorrelated error samples, o(o)ei = Li j=0 a; ) 5 i= 1, 2, , N,. 


each having variance o%), resulting in an energy of (3, a) = 
N oO to): 


3.2 Identification 


In this section we establish the relationship between probability and 
distortion measures for identification purposes. In particular, for the 
present consideration of Gaussian autoregressive sources, we discuss 
the role of Itakura-Saito measure and the likelihood ratio measure’? 
in probability density functions. 


3.2.1 Itakura-Saito measure 


In maximum likelihood estimation, we often maximize the log 
likelihood, log {f(s | a, o7)}, instead of f(s |a, o”) for convenience, 
particularly when the probability density function is jointly Gaussian, 
as in the present case. The log likelihood takes the form 


log{ f(s|a, o?)} = —- > log(2r0?) - ; a(o~'s; a). (21) 
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Since a® and o%) are the maximum likelihood estimate based upon 
(0) 
s'; 


N 1 
log{ f(s | a, o%)} = - 5 log(27a%) — 5 alas; a) 


N N 
ay log(2mo%)) — a 


flog f(s | a, 0?) }max- (22) 


The log likelihood difference between the maximum and other arbi- 
trary values is thus 


La = {log f(s | a, ?)}max — log f(s | a, 0”) 
= log f(s |a®, of) — log f(s | a, 7) 


N N 
=— 2 log(27o%)) — a+ 9 los(2na°) + 5 aos, a) 


Nj} 1 7 
sar + a(o ts: a) + log o? — log o%) — | (23) 


because 


f(s | a, 0?) = (2007) exp |- = aos; a). (24) 


The bracketed term in (23) is, in fact, the well-known Itakura-Saito 
distortion measure’ between {a, o%)}, representing s©, and {a, 07}, 
representing another observation s; i.e., 
1 
dis(s; s) = dis(s; a, 0°}) = x a(o"'s a) 
+ log o? — log o% — 1 


Therefore, the probability of observing s at the output of a source 
with parameters {a, o7} is, in terms of the distortion measure, 


f(s | a, 6”) = (2207)-*exp \- ~ [dis(s: {a, o7}) 


+ log o%) — log o? + u} 
= 2 2 N (0) 2 
= G(o?, oexp — > dis(s; {a, 0°), (26) 
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where 


G(o*, o%)) = (2407) *”exp | Eiiae oi) — log o? + up (27) 


_) 
3.2.2 Likelihood ratio measure 


In many situations, the desired identification is only based upon the 
autoregressive parameters a. This is equivalent to comparing two gain- 
normalized observation sequences. Let {a, o%} and {a, 07} be the 
maximum likelihood estimates corresponding to observations s and 
s, respectively. The gain-normalized observations are then x = s/ 
oo) and x = s/o. From (18) and (20), the maximum log likelihood for 
x is 


log f(x |a®) 


flog f(x | a) }max 


N 1 
ae = ay). (0) 
5 log(27) 5 a(x: a) 


N N 
aaa log(27) — 3° 
But, since 
f(x | a) = (22) exp \- 5 a(2®, af (28) 


the log likelihood difference between the maximum and another arbi- 
trary value is then 


La = flog f(x | a)}max — log f(x | a) 


N NN 1 
Si aig = = (0), 
5 log(27) 5 + 5 log(27) + 5 a(x: a) 


= ar a(x: a) — i} (29) 


The above-bracketed term is the likelihood ratio measure widely 
employed in vector quantization vocoder designs; that is, 


drr(s®; s) & drr(x; x) 
= drp(x; a) 


= ~ ax, a) —1. (30) 


The likelihood ratio measure and the Itakura-Saito measure are closely 
related, 
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dir(s; s) = dra(x; x) 
oe drp(x; a) 
= dig(x; x) 
= dis(x; a). 


As a result, the probability of observing x at the output of an all- 
pole filter 1/A(z) driven by a unity variance Gaussian i.i.d. sequence 
is 


f(x | a) = (22) exp {- y dial; a) + u} 


= P exp \- * duale: a} (31) 

where 
P = (27) N”exp \- Mt (32) 
Equations (26) and (31) are thus the fundamental link between 


probability and distortion measures. 


3.3 Estimation and identification based upon multiple observations 


We have presented parametric estimation of the probability density 
function based upon a single observation in the above. When several 
observations are available and known to be from the same source, the 
estimation turns out to be quite similar to the single-observation case. 

Let s®, i = 1, 2,..., L, denote the available observations. These 
observations are considered to be i.i.d. with probability density 


f(s®|a, 0?) = (2207) *”exp \- 5 o's; a} 


The joint probability density of observations s™, 5, ... , s“ is thus 


f(s™, s, ..., sa, 07) 


L 
= TI f(s®]a, 0”) 


= [(2007) N”]4exp |- ; > a(a ts. a). (33) 


As with the single-observation case, the maximum likelihood esti- 
mate requires minimization of }£, a(o™'s; a). But maximizing 
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f(s, s®, ... , s®]a, o”) is, in our interest here, equivalent to maxi- 
mizing [f(s™, s®, ..., s”]a, o7)]”. Since 


[f(s™, 8, ..., 8a, 07)” 


: Lyne Per 
= (214a7)N/exp \- 3 F Y¥ alas: a. (34) 
int 


the similarity between multiple- and single-observation estimation can 
easily be seen by comparing (17) and (34). From (13), 


1 iz : 
—¥ a(a's™; a) 


Liat 
112 | M | 
= 27 > tare +2 b} rar 
o” Lin fom 
: r LS a 
= ra(0) | > (0)| + 2 5 rg(j) L& rs'(j)}r, (85) 


where 
. N-j 
rj) = Y s!sO, 
n=1 


and s is simply the nth sample in s. Equations (34) and (35) clearly 
_ demonstrate that the maximum likelihood estimate of the source 
parameters, given multiple observations, can be obtained by the same 
minimization procedure as in the single-observation situation, using 
the autocorrelation coefficients averaged over all available observa- 
tions. 

The same result holds for the gain-independent case where the joint 
density for observations x = s/oq, x? = s/o, ..., 2% = 
s/o.) is, after taking the Lth root, 


[fe 2, 2. x [ay] 


1 i Oe : 

= (27) N”exp 4— =1= & a(x; a) |p. (36) 
2{Li 

Note that gain independence is maintained by normalizing each ob- 

servation s with its own estimate o;). Equation (35) thus becomes 


= >) a(x; a) 


Linn 
L 


= 1 @ i < ()( 7 
= ra(0) |F 2 rz (0)) + 2 3 rol J) | % re (J) 


i=1 Dizi 
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L_,(i) M Lyi)( ; 
=o [ESE s2 5 nts 2O) en 


2 
i=1 O(i) i=1 (i) 








Equations (35) and (37) lead to the same optimization procedure as 
the centroid computation in code-book design for vector quantiza- 
tion.* In centroid computation, however, the objective is to minimize 
the average distortion, while in the current probabilistic framework, 
the probability is to be maximized. The equivalence between proba- 
bility and distortion measures is again witnessed. 

Once the source parameters are estimated, the probability density 
is defined just as in the single-observation case, and expressed in (26) 
or (31) in terms of distortion measures. 


IV. PROBABILISTIC FUNCTIONS OF MARKOV CHAINS 


Consider a first-order K-state Markov chain governed by a transi- 


tion probability matrix V = [v,;], 1, 7 = 1, 2,..., K, and an initial 
probability vector u’ = [u, us, ... , Ux]. Obviously, 
K 
Yu=1, u;20 forall 7 (38) 
j=l 
and 
K 
Y vy =1 forany i, (39) 
j=l 


because v,; is the probability of making a transition from state i to 
state j given that the current state is 1. For any integer state sequence 
O = 600, ... Or, where 6; « {1, 2,... , K}, the probability of O being 
generated by the Markov chain can be easily calculated by 


Pr(O | V, u) = Ugo 9,V6,0, ++ + Vor_ Or: (40) 


Now suppose 9 = 66, ... 67 cannot be observed directly. Instead, we 
observe a stochastic process S = s;S2... S7, produced by an underlying 
state sequence 6,02... 67. Each state, say 1, manifests itself through a 
probability density function f;(s). We use F = {f;(-)} to denote such a 
set of density functions. The probability density of observing S = S 4 
$1S2 ... Sr given a specific state sequence O generated by the Markov 
chain with transition probability matrix V and initial probability u is 
thus 


f(S|9, V, u, F) = fo,(s1) fo,(s2) -- - fop(s7). (41) 


Each s; here is a vector without ambiguity. It follows that the proba- 
bility density of observing S given V and u is 
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f(S|V,u, F) = » f(S, OLV, u, F) 
allO 


x f(S|9, V, u, F)Pr(O|V, u, F) 


all@ 


» f(S|0, V, u, F)Pr(O] V, u) 
allO 


K 
> Ue,Vor0, 19, ($1)UVo,0, fe.(S2) cee Vor_0rfo7(S7). (42) 


0; 61, eee 67=1 


Il 


The stochastic process S is characterized by the density f(S| V, u, F) 
and the set of probability density functions F, which is assumed to be 
known and independent of the Markov chain in the above. The triple 
(V, u, F) 4 M is then called a (hidden) Markov model,!* and the 
conditional density for the stochastic process S may be written as 
f(S|M). 

The application of hidden Markov models in speech recognition can 
now be formulated. It is treated as a classification problem. We wish 
to recognize utterances known to have been selected from some vocab- 
ulary W of B words W, W®, ..., W®. (We use “words” here for 
convenience. They may not be words in the traditional sense, but 
merely some lengths of speech utterances.) Every word W is repre- 
sented by a model M;. An observation sequence S = s)s2... Sr of an 
unknown word is given. We then apply the maximum likelihood rule 
and classify S as word W“ iff 


f(S|M;) = f(S|M,) for any j=1,2,...,B. 


Such an application presents two problems: evaluating f(S | M;) and 
estimating model M that maximizes the likelihood of a given obser- 
vation S. Sections II and III showed similar problems. 

The computational load in evaluating f(S | M) appears to be expo- 
nential in 7’, as we see from (42), which is a sum over all possible state 
sequences of length T. With the so-called forward-backward algorithm 
by Baum,?° however, it is only linear in T. The estimation of model 
parameters V, u, and F, on the other hand, is less straightforward, 
and no closed form solution has been found so far. An iterative 
reestimation algorithm by Baum” and Baum et al.’® is usually em- 
ployed to attack this estimation problem for a certain class of hidden 
Markov models, including those with Gaussian autoregressive densi- 
ties of (1), (17), or (18). We shall briefly discuss the forward-backward 
algorithm and the reestimation formula for Markov models with 
Gaussian autoregressive density. Similar developments with applica- 
tions in speaker identification can be found in the work of Poritz."” 
More rigorous developments of these techniques, as well as their 
theoretical verifications, can be found in Refs. 15 and 16. 
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4.1 Forward-backward recursion and trellis structure 
Define the forward probabilities (i) = u;,i = 1, 2,..., K, and 


K 
£,(i) = »» £:-1(7 )vjifi(se) (43) 


fori=1,2,...,Kandt=1,2,..., T. Clearly, 
E(t) = f(Si, Sa, ... 5 St O¢ = t] M). 
Similarly, define backward probabilities 


ne(t) om f (Se+1; St4+2) 006 5 sr| 0, = 1, M) 
K 
= X neri(J vy fj (Se41) (44) 


and nr(i) = 1, fori = 1, 2,..., K,andt=T-—1, T-2,...,0. &(i) 
and n,(i) satisfy 


Ex(t)ne(t) = F(S, 0 = ¢| M). (45) 


Therefore, we have 


K 
f(S|M) = X f(S, 6: = ¢| M) 


K 
= X E-(7) ne(7) (46) 
for any t. In particular, by letting t = T, 
K 
f(S|M) = ¥ ér(i) (47) 
i=1 


so that f(S | M) can be evaluated from forward probabilities alone and 
the computation load is thus linear in T. 

This forward-backward evaluation technique takes advantage of a 
trellis structure when reducing the computational burden. The com- 
plexity of a tree structure, as Fig. 2 illustrates, grows exponentially in 
T. It treats distinctive paths differently, as if at instance t, the number 
of available state indices were K‘, as we noted by the parenthesized 
index in Fig. 2. The original evaluation formula of (42) displays a tree 
structure since the summation is directly over all 0;,i = 0,1, 2,..., T, 
in the range of 1 through K. This tree structure can be transformed 
easily into a trellis structure, as Fig. 3 depicts for K = 4, by recognizing 
the fact that at any instance t and any state 0, there are always only 
K possible next states regardless of the past transition history. The 
branches in the tree structure merge into K nodes (states) at every 
instance. The definition of the forward probabilities can be stated as 
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° 1 (K2-K+1) 


1 (K3-K+1) 





I 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
\ 
1 


| 
| 
| 
| 
| 
| 
2 


t=0 


Fig. 2—Tree structure showing exponential growth of complexity. 


follows: the density of the event that 6, = i and sj, So, ..., Ss are 
observed is obtained by summing, over all j, the densities of the event 
that 6. = J and sj, So, ... , S-; are observed, multiplied by the 
transition probability from state j to i and by the density of s, = s; at 
state 1. Thus, as Fig. 3 shows for K = 4, &(1) = &:()unfi(s:) + 
&+-1(2)vai fi(s:) + &-1(8)vai fi(se) + &-1(4)0ai fi(s;). Every &(7) can be 
computed recursively from £,-;(i), i = 1, 2,..., K. Each progression 
from t — 1 to t requires the same amount of computation and, therefore, 
total computation is linear in T. 

The tree/trellis structure also shows that dynamic programming 
techniques such as the Viterbi algorithm or the (M, L) algorithm, etc., 
can be employed efficiently to find a particular path 9,, such that 
f(S, 8.p| MD) = max f(S, O| M). Later we will show that dynamic time 
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Ee) = Epa (Mag Fy bse) + Fp-4 (vai False) 
STATE + £5. 4(Blvgify tsp) + p24 (4)vgyhy (5) 


¥44F4 (sp) 
6, 2 eesesss—(Cs(CF7'? 
€4.4(1) 


SB SesH 


Spar (1) 
°e 








£412) WZ §441(2) 
XY 2 
- RX 
3 8 ax Y7 L\ ° 
€ (3) oy boat (3) 
4 ore +) 
£-4(4) £4(4) £4 41(4) 


t-1 t t+ 


Fig. 3—Trellis structure for evaluating probability in hidden Markov models. 


warping displays similar requirements of finding an optimal path (the 
warping function) to minimize the accumulative distortion. 


4.2 Reestimation—Baum-Welch algorithm 


We discuss here only the estimation of Markov models with Gaus- 
sian autoregressive densities; that is, in F = {f;}#£,, every f; takes the 
form of (17)* and is characterized by parameters (a;, o?). Equivalently, 
we write F = {(a;, o7)}#;. The objective of model estimation here is to 
find a model M = (V, u, F) that maximizes the likelihood of a given 
sequence S, for fixed number of states K and order of autoregression 
M. 


Given a sequence S and an arbitrary model M, the Baum-Welch 
reestimation algorithm iteratively finds another model M’ = (V’, wu’, 
(a/, o/7)) that leads to f(S|M’) = f(S|M). The algorithm continually 
improves the estimate and converges to a local optimum. Let 


yet) & f(S, 6 = ¢| M) 
= &(1)-me(2) (48) 


*The gain-independent case can be easily developed in the same way that the 
previous sections showed. 
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fort =0,1,2,..., 7,andi=1,2,..., K. Further, define 
yr(i, J) A f(S, 6-1 = i, 0, = j | M) 
= &-1(i)vyfj(se)ne(J) (49) 
for i,j =1,2,...,K, andt=1, 2,..., T. A new estimate of transition 
probability vj for V’ is obtained by 
T 
> y(t, J) 
t=1 


a 


> ¥e(7) 
t=1 


(50) 


—— 
vy = 


And by applying Theorem 3.1 of Ref. 16 one chooses (a/, o/”), for each 
i=1,2,...,K, such that (see Appendix A) 


a(S; af) = min a(S; a) (51) 


and 


of nt a(S; aj) (52) 


t b] 


T 
N Py y(t) 


where §; represents a composite observation whose autocorrelation 
coefficients are 7;(j), 


T 
ri(J) = y ye(u)re(7) 


N-j 


T 
= )) (i) x SinStntj- (53) 
t=1 n= 


Note that s,, is the nth sample in observation s, The composite 
autocorrelation 7;(j) is, as seen from (53), a weighted average autocor- 
relation. The weight is the density or relative frequency of the obser- 
vation S being at state i at instance t. The concept of relative frequency 
may be helpful in relating (53) to (35) where a uniform average is 
involved. After 7;(j) for i= 1, 2,..., K, andj = 0, 1, 2,..., M have 
been calculated, each (a/, o/”) is found by using the same maximum 
likelihood estimation procedure as in Section II. 

In estimating the Markov model parameters based upon multiple 
observations S®, S®,..., S”, we try to maximize the joint density 
given the same model f(S, S®,..., S“|M). Since S® are inde- 
pendently observed, 


1230 TECHNICAL JOURNAL, SEPTEMBER 1984 


[SS i258 7 ND 
L 
= |] f(S®|M). (54) 
i=l 
The Baum-Welch algorithm can be equally applied and the key equa- 


tion of (50) for new estimate of transition probability becomes (see 
Appendix B) 


T L 
» [power] 
v5 = > L (55) 
2 b 3 [Sea] | 
where 
. _ f(S°, 6 = i|M) 
(Dery — LAY 2 Oe F Ue) 
yt’ (i) (S| M) (56) 
and 
(d) a sacs 
yi, j)= f(S®, 6-1 = i, 06 = J |M) (57) 


f(S|M) 


The new estimate of (a/, o/”) of (51) and (52) then follows the same 
procedure as previously discussed for the maximum likelihood estimate 
based upon multiple observations. In particular, one can show (see 
Appendix B) that the new improved estimate (a/, o/”), according to 
the reestimation algorithm, satisfies 


a(S; aj) = min a(5;, a) (58) 
and 
N y 5 Y Oi) 


t=1 [1 


where §; now represents a composite observation whose autocorrelation 
coefficients are 7;(j), 


T OL 
AG) = YY vPWrPV) 
t=1 [=1 
= y x yP(i) by Bs Oa, (60) 
t=1 /=1 


with s$, being the nth sample in the observation vector s, of sequence 
S®, 
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V. DYNAMIC TIME WARPING AND HIDDEN MARKOV MODELS 


In this section, we establish the relationship between dynamic time 
warping using linear predictive coding (LPC) distance measures and 
hidden Markov models with Gaussian autoregressive densities. We 
give a unified view on these two techniques such that comparative 
discussions on the two techniques can be made easily. 

Consider two speech sequences, W = wiw2... Wr, and Y=y1y2... 
yr, called the reference and the test sequence, respectively. Through 
some warping function ¢(-), or ¢(-), 


ty = (tw), bg yA; 48's Pi (61) 
tn = Hy), t= 1,%...,T), (62) 


a correspondence between W and Y can be established. Let d[-;-] be 
a distortion measure. The conventional dynamic time warping uses 
dynamic programming techniques to determine ¢ or ¢ such that 


i : 
Dz = x d[w:,; Yot,)] (63) 
tiy= 
or 
Ty 
D; = 2 d[y.,; Wee,)] (64) 


is minimized. In recognition, we classify an utterance Y as word W 
in a vocabulary {W, W®,..., W} if 


To : 
[DP min = min | d, d[we, vei] 
gl t= 


TY) 
< min { x, d[wi??, Yell for j= 1,2,...,B (65) 
ov ty= 


or 


re | 
[DP min = min | a dy, W ring J} 
| G=1 ™ 
a , 
< a { Y dy, vflrgl} for j=1,2,...,B. (66) 
DD |g=1 


The choice of warping directions, i.e., using ¢ or ¢, is rather arbitrary 
and often is taken into consideration together with some continuity 
conditions.”° 

Now, consider a T,,-state hidden Markov model M,, = (V., un, F.), 
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where V,, is a T,, X T, matrix V,, = [v,], vs = 1/Tu, for any i, j = 1, 
2,..., Tw, Uy is a T,,-dimensional vector with u; = 1/T,, for any i = 
1, 2,..., Ty, and F,, is the set of Gaussian autoregressive densities 
{ f;}24. Each f; is defined by a parameter pair (a,,;, 02,;), which is the 
maximum likelihood estimate based upon observation w;. We further 
consider a particular state sequence, called progressive sequence 
O,, = 000, ... 9r,, where 6; = i and where 6e{1, 2,..., T,,} is arbitrary. 
Then, 


f(W, 9. | M..) 
Ty 

= (1/T,,) 7" II fi(u; | aw,is oi) 
i=l 

and from (22), 
log f(W, 9., | M..) 
N | Te : 
= 5 b log(2702,;) + r.| — (T, + 1)log T, 


= max {log f(W, O|M.)}, (67) 


OO }(r,) 


where {O};r,) denotes the set of all state sequences with length T,,. 
We define a similar model M, = (V,, uy, F,) for Y, in which uv; = 
1/T,, u; = 1/T, for i,j =1,2,..., Ty, and Fy = {(a,,, o2,)}2,, where 
each (a,,, %;) is estimated based on y;. We thus also have 


f(Y, O,| My) 
Ty 
= (1/T,)?* II Fil yi | Ay, 05,2) 


and 
log f(Y, 0, | M,) 

_N 
2 


Ty 
b log(2a02;) + 1, — (T, + 1)log T, 
A 


max {log f(Y, 8| M,)}. (68) 


O<{0\(7,) 


A correspondence between progressive state sequences 9,, and 9, is 
made through warping functions ¢ or ¢, as defined in (61) or (62). 
Within such a framework, 
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f(Y, §(O,) | ML) 
1\0" % 
= (7) I fey Yi] Aves Tm,¢0) (69) 


w 


and 


log f(Y, §(O,) | Mu) 
Ty 1 Ty = 
=— >) log(270%,, si) — ae aout) Vis Aw, si) 


— (T, + 1)log T,,. | (70) 


In the above we have used {(0,) to denote the {-warped progressive 
sequence ©,. Note that (68) is also a maximum over all models with 
fixed V, and u,. The difference between (68) and (70), 


log f(Y, O,| My) — log f(Y, 6(0,) | Mu) 


=— >) 5 {i alan) i) Vis Aw r)) + log a, ti) 


—log «3; — i} + (T, + 1)(log T,, — log Ty), (71) 
is thus nonnegative if T,, = Ty, (sufficient). In realistic situations, 
log T,, = log T,, so that 

log f(Y, Oy|M,) — log f(Y, $(O,) | Mw) 
Nw 


Soe 
= 2 » i alow Vi Aw,t(i)) + log a2, ei) = log a3; = i} 
i=1 


le 


~, 


y disly3 Wri] 


_ 


D;. (72) 


ro| > 


Therefore, the accumulative distortion D; in dynamic time warping is 
directly related to the likelihood difference between the two models in 
generating Y sequence. To express the density f(Y|M.,) in terms of 
D; we further have 


f(Y|M.) = d f(Y, §(Oy) | Mw) 


N 
= f(Y, 8,|M,) Ls exp (- * p,)} (73) 
allt 2 
The same results can be extended to f(W| My), 
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f(W|M,) = » f(W, $(O,) | M,) 


= f(W, 0. | Mw) ls exp (- *.)} (74) 
all d 

Equation (72) demonstrates that determining a warping func- 
tion ¢ in dynamic time warping is equivalent to finding only the best 
state sequence that maximizes the density f(Y, ¢(®,) | M..). On the 
other hand, the density f(Y|M.,,) can be calculated by summing 
exp {— (N/2)D,} over all possible warping paths ¢ and then multiplying 
this sum of exponential terms by a constant f(Y, ©,|M,). Since 
f(Y, 8,|M,) is defined by the unknown sequence Y only, it does not 
affect the classification problem for recognition formulated above. The 
accumulative distortions D; are then the key determining factor. 

Although we have defined the transition probabilities uv, to be all 
equal for any i and j, it is not absolutely necessary for the results of 
(72). It has, however, a simpl¢ but important interpretation in the 
calculation of f(Y|M.,). Equal vu, for 1, 7 = 1, 2,..., Ty allows the 
input sequence 9, to be warped in every possible way and in every 
possible permutation. One thus may not expect a good recognition 
performance based upon the density f(Y|M.) = ¥; f(Y, ¢(8,)|M.), 
as the time order of the observed speech sequence is crucially impor- 
tant. (A reversed “we” may sound very close to “you”!) For word 
recognition, some constraints on the transitions are therefore desira- 
ble. Markov models for two types of serial constraints, for example, 
are shown in Fig. 4. In Fig. 4a, single and double transitions are 
allowed. Similar results linking dynamic time warping and Markov 
modeling can be obtained in this casé if 





1 , 2 3 4 5 


(b) 


Fig. 4—Markov chains for two types of serial constraints: (a) single and double 
transitions permitted, and (b) only single transitions permitted. 
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These kinds of constraints all have their counterparts in dynamic time 
warping, appearing as the continuity conditions for determining the 
warping function. It is expected that these constant transition proba- 
bilities lead to a similar approximation, as appeared in (72). Unlike 
(71), however, the lack of exactness in (72) is now caused by the 
increased transition probabilities at the end of the sequence. The idea 
of unconstrained endpoint algorithm in dynamic time warping to 
correct for the abrupt change in transition “possibilities” at word 
boundaries is, hence, noted. 


VI. SUMMARY 


We have given a unified theoretical view on the two dominant 
speech recognition techniques, namely dynamic time warping and 
Markov modeling. We described the role of some well-known distor- 
tion measures in the context of probability densities. After the rela- 
tionship between probability density and distortion measures is made 
explicit, the similarities between the two techniques can be seen. We 
have shown that if the underlying transition structure is equiprobable, 
dynamic time warping is equivalent to the probabilistic modeling 
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technique except that it searches for the best transition path to 
minimize the accumulative distortion, while the probabilistic tech- 
nique sums the density along every possible path. The results show 
that the two techniques may not be mutually exclusive, particularly 
when the density functions are exponential and LP-related distortion 
measures are used. The discussion may be helpful in bringing about a 
better understanding of each technique and possible future improve- 
ments. 
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APPENDIX A 
Reestimation—Single Observation 


We first show that the new, improved estimate (a/, o/7) satisfies 
(51) and (52). We follow Ref. 16 and define the Q-function as 
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Q(M, M’) = ry f(S, O| M)log f(S, O| M”’), (75) 


where 
log f(S, O| M’) 


K 
= X (log uj )5(00 — J) 


(log viz)6(8.-1 — k)5(6, — 1) 


Mx 
Mn 
Ms 


+ 


Pond 
I 
_ 
a 
{l 
par 
ane 
Il 
= 


ee , [log fi(s:)]6(@, — 1), (76) 


Mn 
iMs 


o. 
ll 
fu 


6(-) is the Kronecker delta, and 
fi(s) = f(s| ai, oi”) 
= (20)-%/?(a/)-“exp \- : a(a{7's; aa| (77) 


according to (17). It has been shown that’® 


f(S|M’) 
as > Us, I V6,_,6,1 6,(St) 
all 8 t=1 


= )) Ua, Il Vo,_,0, fo, (Se) 


all@ t=1 


= f(S|M) 


if Q(M, M’) = Q(M, M). Therefore, we may obtain a new, improved 
estimate by maximizing Q(M, M’) with respect to M’. Equation (76) 
shows that the contributions due to u’, V’, and F’ are separated and 
maximization of Q(M, M’) can thus be carried out independently with 
respect to each parameter set. In particular, 


Q(M, M’) = Q.(M, uw’) + Qy(M, V’) + Qr(M, F’) 
K 
= Q..(M, u’) oF Qv(M, Vv’) + X Q;(M, fi), 
where 
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K 


Qu(M, u’) = ay f(S, O|M) 2 (log uj )6(90 — J), 


1 


K T 
Qyv(M, V’) = » f(S, O|M) 5 dy & (log viz)6(-1 — k)6(0, — a 


k=1 l=1 t=1 


~ 
o> 


and 
T 
Qi(M, fi) = 2X f(S, O|M) >» [log fi(s:)]5(@ — 7). 
all © t=1 
We shall not repeat the treatment of u’ and V’ here, but only discuss 


the maximization of Q;(M, fi). We rewrite Q,:(M, fj) as 


T 


Q(M, fi) = 2 » f(S, O| M)6(6, — » fog fi(s:) 
T 
= y f(S, 6; = i| M)log fi (s:) 
T 
{- * tog Qa — > log of? — : alc! ss a). (78) 


Maximizing Q;:(M, f/) with respect to a/ is equivalent to minimizing 


T 
> FS, 6: = i| M)a(s, af) 
T 
= x f(S, 6 = ¢|M) 
M 
Arot0yrd0 2 x rein 
T 
= r,’(0) b T(S, 1, = rir 


M T 
+2 2X ra'(j) b f(S, 9 = iiMonAi (79) 


where r,(j) and r,(j) are the j-lag autocorrelation coefficient of a; 
and s,, respectively. Since (79) is simply a(s; a/), (51) is thus proved. 
Equation (52) then follows from maximizing (78) with respect to o/? 
given aj. 


SPEECH RECOGNITION 1239 


APPENDIX B 


Reestimation—Multiple Observations 


We apply the Baum-Welch algorithm to maximize 


L 
f(S%, S®,..., S®|M) = TT f(S®| MD, (80) 


given multiple observations S”, S®,..., S”. For brevity, we use {S} 
to denote the set of observations S, i = 1, 2,..., L. In addition, for 
each observation sequence S, there is a corresponding probable state 
sequence 0, and 


f(S®|M) = », f(S, 0 | M). 


We further use {9} to denote the set of probable state sequences 
behind the set of observations {S}. Then (80) becomes 


F({S}| M) = Dy F(tS}, {0} | M). (81) 


Accordingly, we define the Q-function as follows: 
Q(M, M’) = x F({S}, {O}| M)log f({S}, {O}|M’), 


where 


K L 
log f(S}, 10}1M’) = YY (log us)5(0 — j) 


K K L T 
+ ¥ YY Y (log vy)s(oPs — sof? — 1) 
k=1 [=1 i=1 t=1 
K L T 
+ x X yi [log fm(st?)]6(02? — m). (82) 


and f,,(s) is defined as in (77). We shall address the maximization of 
Q(M, M’) with respect to V’ = [vz] and F’ = {f/}. Extension of 
previous results on the initial probabilities to the current case of 
multiple observations is straightforward. Again, (82) shows separate 
contributions from different parameter sets and, hence, we write 


Q(M, M’) = Q.(M, u’) + Qv(M, V’) + Qr(M, F’), 
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where 
K L 


Q.u(M, u’) = Py f({S}, {0} | M) x 2X (log uj )5(80 — j) 
K K LT 
Qv(M, V’) = x F({S}, {O}] M) x x D, x (log vin) 
-6(02, — k)d(a? — 1) 


K K LT 
= 2 12 f({S}, {8} |M) x x x (log viz) 


-6(02, — k)d(6) — o 
K 
= ) Q.(M, vz), 
k=1 
and 


K L T 
Qe(M, F’) = ¥ FUS}, (O}1M) YY fog Fn(s!?)]6(0 — m) 


m=1 i=1 t= 


_ 
~ 
» 


K LT 
X 12 F(tS}, {O}1M) & x [log fn(st?)6(0f? — mh 


K 

X Q,,(M, fm). 

In the above, v7, = [Vii Vig... Vix)’ and 

K 

YX ug =1 for k=1,2,...,K. (83) 
(1 


Let Q be the Langrangian of Q,,(M, vi) with respect to the constraint 
(83), 


K 
Q = Que(M, vi) + A b Vki — 1} 
i=l 


We need to solve the equation 


ah a 
——- +r= 0. 84 
Then, we have 
OQur os = , OQve 
S=-h=- VY vwA= YP vk 85 
Ov;; 2 ~ » © ak; OU}; 
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Since, 


ie X f0S}, 10}1M) , y v7 HO ~ b)5(00 — j) 





=F. Ly y » f({S}, {0} | M)5(021 — k)8(0P — J) 


Vij i=1 t=] 


=+¥ ¥ (8, 0 =, 0 = j1M) 


Uj i=1 t=1 
-F({S} | M)/f(S® | MD, (86) 
and 
. , OQuvr 
» OM Ov;; 
K L T 
=> 2X EFS, 621 = k, 0? = j|M) 
j=1 i=1 t=1 
-f({S} | M)/f(S® | M) (87) 


L T 
= 2 2 f(S®, 02: = k| MF(S}|MD)/f(S° 1M), 


i=l t=1 


we arrive at the solution of (55). The term f({S}|M) in the above has 
no effect as it appears in both (86) and (87). 

Next, we deal with the maximization of Q,(M, f}). Note from above 
that 


L T 
Qa(M, f7) = » FS}, {0} | M) » Py [log fj (st?)]6(9? — 7) 
T 
ie E {3 F({S}, {0} | M)5(o2 -i} log fj (s?”) 


T OL 
=) 2 f(S®, oP = j|M) 


N 
_ @ (8 25 = ~ log of? — = alo) “1s}?; af } 


-£({S}1 MD/f(S° | M), (88) 


an expression similar to (78). Maximizing Q,(M, f/) with respect to 
aj is equivalent to minimizing 
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al 


f(S, 0 = 7) MDEFES} | M)/f(S® | M)]a(s?; aj) 


Ms 


os 
i 
~ 
~ 
lI 
= 


T OL. 
= re(O| 3 Y f(S, 0P = 7) MDFES} Ino /fis LMP) 


t=1 i=1 


M T L a 
+2) rem) |3 y f(S, 6 = j|M) 


-[A(ES} | MD/f(S® | M)Ir(m)| (89) 


where ra’(m) and r{(m) are the m-lag autocorrelation coefficient of 
aj and s! sequences, respectively. Similar to the development in 
Appendix A, (89) is simply a(s;; aj) and (58) must be satisfied in order 
to maximize Q,;(M, fj) with respect to aj. Equation (59) then follows 
from maximizing (88) with respect to o/” given a}. 
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In this paper we present results of a series of experiments in which 
combinations of vector quantization and temporal energy contours are incor- 
porated into the standard framework for the word recognizer. We consider 
two distinct word vocabularies, namely, a set of 10 digits, and a 129-word 
airlines vocabulary. We show that the incorporation of energy leads to small 
but consistent improvements in performance for the digits vocabulary; the 
incorporation of vector quantization (in a judicious manner) leads to small 
degradation in performance for both vocabularies, but at the same time reduces 
overall computation of the recognizer by a significant amount. We conclude 
that a high-performance, moderate-computation, isolated word recognizer can 
be achieved using vector quantization and the temporal energy contour. 


Il. INTRODUCTION 


The most popular form for an isolated word recognition system is 
the classic statistical pattern recognition implementation shown in 
Fig. 1. In this model the speech signal is first analyzed by the feature 
measurement block, which produces a test pattern consisting of a 
temporal sequence of (spectral) feature vectors characteristic of the 
speech sounds in the word. Most typically, the feature measurement 
system is either a bank of highly overlapping (in frequency) bandpass 
filters, or a Linear Predictive Coding (LPC) analysis. In either case 
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Fig. 1—Pattern recognition model of isolated word recognition system. 


the feature vector, for a given time interval, is an estimate of the 
short-time spectrum of the speech signal. The job of the pattern 
comparison block is to register, in time, the test pattern with each of 
a set of stored reference patterns, and to determine a similarity 
(distance) score for each such pair of patterns. It has been shown that 
one must use some type of dynamic programming algorithm to achieve 
the required degree of time alignment for arbitrary word vocabularies.’ 
Associated with the time alignment is a spectral distance measure for 
comparing frames of the test and reference patterns. Such distance 
measures are often as simple as summing spectral magnitude differ- 
ences, or as complex as the likelihood ratio measure.” The final stage 
of the pattern recognition model of Fig. 1 is the decision rule that 
makes a recognition decision, or possibly a set of decisions, based on 
the distance or similarity scores provided by the pattern comparison 
block. The most widely used decision rule is the nearest-neighbor rule, 
which chooses the recognized word as the one whose pattern has the 
smallest distance score. Alternative decision rules are variants of the 
K-Nearest-Neighbor (KNN) rule.® 

A wide variety of isolated word recognizers have been designed, 
based on the structure given in Fig. 1, and have been shown to yield 
good performance for several types of word vocabularies and talker 
sets.*° The major obstacle to the widespread use of such recognizers 
for simple applications (home computers, terminals, etc.) is the inher- 
ent cost of the implementation. This cost, either in terms of compu- 
tation or actual dollars, is primarily due to the cost of implementing 
the pattern comparison block with a dynamic programming algorithm. 
Several alternative recognition structures have been proposed for 
reducing the cost of the recognizer. These include replacing the non- 
parametric model of Fig. 1 with a parametric model [e.g., a Hidden 
Markov Model (HMM)],° using recognition structures without time 
alignment procedures,’ and using some coding technique on the feature 
vectors to significantly reduce computation in the dynamic program- 
ming algorithm.® The first two alternative recognition strategies are 
still under investigation, but at the present time they yield degraded 
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performance for several standard vocabularies in the speaker-inde- 
pendent mode. The third alternative is the subject of this paper. 
Shikano has presented some results on trade-offs between computa- 
tion and performance achievable using coding techniques. We extend 
his results and apply them to two useful and interesting word vo- 
cabularies, and consider their applicability in a speaker-independent 
mode. 

The actual recognition structure used in this paper is an LPC-based 
system, which uses the likelihood distance metric in a Dynamic Time 
Warping (DTW) implementation of the pattern similarity block of 
Fig. 1. The coding technique used to reduce computation in the DTW 
algorithm is LPC Vector Quantization (VQ).®’° The way to reduce 
computation is to replace each feature vector in the reference pattern 
by one of a set of fixed LPC vectors from a code book (designed from 
an appropriate training set). If we similarly replace each feature vector 
of the test pattern by the closest vector in the code book, then, by 
precomputing the matrix of distances of each code-book vector to 
every other code-book vector, the distance computation of the DTW 
algorithm becomes a simple table-lookup operation. Since the distance 
computation dominates the overall computation of the recognizer, 
significant reductions in computation are achieved with this technique. 
It remains to be shown that performance degradation (due to the 
distortion introduced by vector quantization) can be kept small. 

This paper also discusses the application of temporal energy con- 
tours to the recognizer structure of Fig. 1. Previous work by Brown 
and Rabiner’! shows that by treating the energy contour (normalized 
over the entire word duration) as a new feature, and by incorporating 
this energy feature into the distance metric as an independent, additive 
feature, performance of the conventional DTW recognizer was im- 
proved. Work by Rabiner et al.’? shows how vector quantization design 
algorithms can incorporate energy directly into the standard code- 
book design procedure, yielding a joint quantization of the LPC vector 
and its energy value. In this paper we integrate both these results into 
acommon framework. We also implement an isolated word recognizer, 
incorporating vector quantization to reduce computation and using 
temporal energy contours to achieve performance comparable to that 
of the DTW system without using either VQ or energy. 

The organization of this paper is as follows. In Section II we describe 
the implementation of the isolated word recognizer using vector quan- 
tization and temporal energy contours. In Section III we describe and 
give results from a series of evaluation tests on two distinct sets of 
word vocabularies to show the performance of the overall word recog- 
nizer in a speaker-independent mode. In Section IV we discuss the 
results and compare them to those obtained in other studies of com- 
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putational reduction techniques. Finally, in Section V we summarize 
our findings. 


Il. STRUCTURE OF THE OVERALL ISOLATED WORD RECOGNIZER 


Figure 2 is a block diagram of the word recognizer incorporating 
vector quantization of the LPC-feature vectors and using temporal 
energy contours. The speech signal, recorded off a dialed-up telephone 
line and digitized at 6.67-kHz rate, is first blocked into 45-ms frames 
and analyzed to give LPC vectors every 15 ms (100 samples) using the 
autocorrelation method. A Hamming window is applied to the 45-ms 
(300 samples) section of speech, which has been preemphasized using 
a first-order digital network, and a set of (p + 1) autocorrelations are 
computed. In our implementation we use p = 8 poles for the telephone 
bandwidth signal. The signal energy (unnormalized) for the [th frame 
of speech is the zeroth-order autocorrelation, R,(0). We denote the 
pth-order LPC vector for the /th frame as a), and the log energy for 
the Ith frame (after normalization, which will be described later) as 
E}. 

The next stage in the processing of Fig. 2 is vector quantization of 
the LPC vector (with or without the energy parameter). To perform 
vector quantization, we need a predesigned code book of vectors and 
an appropriate distance metric for comparing the LPC vectors of the 
speech signal with the prestored code-book vectors. If we denote an 
arbitrary test vector as the pair 4 = (a, ET), and an arbitrary code- 
book vector as the pair b = (b, E®), then an appropriate distance for 
comparing 4 and b is"! 

oid b'V’b 
aS eee 


where V’ is the Toeplitz matrix of autocorrelations of the test frame, 
a is a suitable weighting factor on the energy distance, and f(x) is a 
nonlinearity of the type 


1) + of(18" ~ B*)), (1) 


0, |x| < Exo 
f(x) = 4 |x| — Eto + Eor, Evo <x < Em + Evo — Eor (2) 
Em, |x| > Eur + Evo — Eor, 


where Exo, Ey, and Eor are appropriately chosen thresholds and 
energy offsets. 

The first term in brackets in eq. (1) is the conventional Itakura 
LPC likelihood ratio (in its linear form),” and the second term is an 
energy distance that is added to the LPC distance. The weighting 
factor, a, accounts for the fact that energy distances bear significantly 
less information than LPC distances. The nonlinear function, f(x), 
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‘Fig. 2—LPC/DTW recognizer using vector quantization. 


accounts for the fact that small energy differences (i.e., less than Exo 
dB) are insignificant and hence should add no distance, and that large 
energy distances should be clipped at some appropriate value. Energy 
distances between these extremes are linearly weighted. In practice, 
because the Itakura likelihood ratio is essentially unbounded in value, 
a clipping function, g, is also applied in eq. (1) so that 


x x < Deup 
g(x) = (3) 
Deu x > Dee, 


where x is the expression in brackets in eq. (1). 

A complete specification of the distance metric of eq. (1) requires 
specification of values for the LPC clipping threshold, Dcup, and the 
energy thresholds, Exo, Ey1, and Eor. These values are obtained by 
experimentation in a small pilot test, and specific values will be given 
in Section III. 

Once the distance metric of eq. (1) is given, the implementation of 
the vector quantization stage of Fig. 2 using a code book with M* 
vectors, for the /th feature vector, is a computation of the form 


d* = min d(&, bn) (4a) 
1lsmsM* 
m* = are| min d(&, ba (4b) 
1<msM* 
a) => Din, | (4c) 


i.e., we find the code-book vector b,,* such that its distance to the 
analysis vector a; is minimum over all code-book entries, and we 
replace a; by b,,*. (Equivalently, all we need to save is the index m’*, 
which gives the code-book vector with the minimum distance, since 
the code-book vectors are fixed.) 

Once a test feature vector has been vector quantized, and similarly, 
each reference feature vector has been vector quantized, then the 
calculation of distance between test and reference feature vectors (as 
required in the DTW alignment procedure) becomes simply a table- 
lookup procedure from a table of all possible distances between code- . 
book vectors. Thus, if we define the M* by M* matrix of code-book 
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vector distances as 
Des(i, j) = d(b;, bj), 1<i<M*, 1<j<M*, (5) 


and precompute Dea(i,j) and store it, then the distance between a 
test vector coded by code-book vector i*, and a reference vector coded 
by code-book vector j*, is simply Dcg(i*, j*), and is computed in the 
time for a single table-lookup. Hence, the number of distance compu- 
tations [nominally needing about (p + 1) multiplications and addi- 
tions] is essentially reduced to zero. In this manner the computation 
of distance from the DTW alignment is substantially reduced. The 
technique of vector quantizing both the test and reference patterns, 
and then using the matrix of distances for a table-lookup computation 
is called a double-SPLIT VQ by Shikano.? For the double-SPLIT 
method, a full LPC analysis (i.e., the Levinson recursion) does not 
need to be carried out since the prediction residual is common to all 
distances in the minimization of eq. (4) and hence, need not be 
computed. 

It should be noted that the process of vector quantization of a; leads 
to a distortion error, €, given by 


q = a — Dm’ (6) 
since the actual feature vector does differ from the code-book vector. 
One way of avoiding this distortion in the test feature vector, due to 
Sakoe,!? is to save the vector of distances, d(l, m), of the form 


d(l, m) = d(a, bn) (7) 


for all frames, |, of the test pattern, and all code-book indices, m. In 
this manner whenever a distance is required between the true test 
feature vector, a,, and a reference vector quantized to code-book vector, 
b,, then the distance can be looked up in the distance vector for frame 
l as the qth entry. Thus, we eliminate storage for the M* by M* 
distances of the code-book vectors, but we instead need storage for the 
M* by L distances, for a word of L frames, of the vector quantizer. 
Since L < M*, in most cases, this simplification of the vector quanti- 
zation generally both increases performance of the recognizer (since 
no distortion of the test vectors is incurred) and decreases storage of 
the system. This technique is called a single-SPLIT VQ by Shikano.® 

The remaining steps in the recognizer of Fig. 2 are essentially those 
of a conventional DTW-based word recognizer. The DTW alignment 
compares the test pattern (in some type of VQ format) to each 
reference pattern (coded as a series of code-book vectors) and generates 
a distance score. The KNN rule examines the best K scores for each 
vocabulary word and gives an ordered list of word distances based on 
the average of the K scores. The “recognized” word is selected as the 
word whose best-K patterns have the smallest average score. 
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2.1 Generation of LPC-vector code book 


The generation of vectors in the code book of the vector quantizer 
is straightforward and follows the procedures outlined in Refs. 9, 10, 
and 12. We used a training set of 39,000 LPC vectors with energy 
values. The vectors were extracted from isolated digit sequences spo- 
ken by 100 talkers (50 male, 50 female). Code books of size M* = 2, 4, 
8, 16, 32, 64, and 128 were generated. All results presented in this 
paper are for code-book size 128. Results, on digit evaluation tests, for 
smaller-size code books are given in Ref. 12. 


2.2 Normalization of the energy contours for words 


The raw energy value for the /th frame of a word, E), is computed 
as 


E; == 10- logio(F,(0)), = 1, 2, es L, (8) 


where L is the number of frames in the word. The normalization of 
energy is performed by finding the maximum energy value, Emax, over 
the word as 

Emax = max (E,) (9) 


1<I<L 
and by subtracting Eyax from E; to give 
E, = E, — Emax. (10) 


In this manner the peak energy value of each word is 0 dB, and the 
recognition system is relatively insensitive to differences in gain 
between recordings. Of course the computation of eq. (9) means that 
word energy contour normalization cannot take place until the end of 
the word is located. This constraint poses no real difficulty since there 
are ways of implementing an approximate gain normalization in “real 
time” based on some realistic assumptions about the rate of change of 
system gain. 


IH. EVALUATION TESTS OF THE RECOGNIZER 


To evaluate the effects of the vector quantizer and the use of energy 
contours on the performance of the isolated word recognizer, we 
performed a series of three sets of recognition tests. For the first two 
sets of tests, the word vocabulary was the ten digits (zero through 
nine), and for the third set of tests, a 129-word airlines vocabulary 
was used.'*!° All tests were conducted in a speaker-independent mode 
in which all test recordings were made off a standard, dialed-up, local 
telephone line. A set of 12 speaker-independent reference patterns, 
obtained from a conventional clustering analysis’® of 100 tokens per 
word, were used for each vocabulary word. 
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We obtained three test sets, denoted TS1, TS2, and TS3, in the 
following way. TS1 consisted of 100 talkers (50 male, 50 female) who 
each spoke each digit once. These talkers were the same ones who 
generated the training tokens used to create both the isolated-digits 
reference patterns, and the code-book vectors. However, different 
recordings were used for the training sets than for the test set. The 
second test set, TS2, consisted of ten talkers (five male, five female) 
who each spoke each digit 20 times. These talkers were not members 
of the 100-talker training set and were chosen from a set of 100 talkers 
used in a large-scale evaluation of a combined digit recognition, talker 
identification system.'” The set of ten talkers was chosen on the basis 
of preliminary experimentation, since they had an error rate somewhat 
above the average of the 100 talkers used in the experiment. In this 
manner it was hoped that the TS2 data would amplify differences in 
test performance results. 

The third test set, TS3, consisted of 20 talkers (10 male, 10 female) 
who each spoke the entire airlines vocabulary a single time. These 20 
test talkers were different from those who provided the training tokens 
used to give the word reference templates. 


3.1 Results on digits (TS1) 


A series of recognition tests were performed in which temporal 
energy and vector quantization were tried in all combinations with the 
basic LPC-based DTW recognizer. A total of six test results are given 
in Table Ia for the following cases: 

Run 1—Standard LPC-based DTW recognizer without energy and 
without VQ. 


Table |—Average digit error rates for the top 8 word candidates for 
six runs 


Error Rate (%) for Top 6 Candidates 


Run Energy VQ VQ or ee ee 
Number Used Ref. Test 1 2 3 4 5 6 
(a) TS1 data 
1 No No No 2.7 0.8 0.2 0.2 0.2 0 
2 Yes No No 2.1 04 02 O02 O12 0 
3 No Yes No 3.2 09 O2 O1 01 0 
4 Yes Yes No 24 O05 £O1 0.1 O 0 
5 No Yes Yes 4.0 1.1 0.3 02 O 0 
6 Yes Yes Yes 3.5 06 40.1 0.1 OL 0 
(b) TS2 data 
1 No No No 3.6 1.1 0.3 0.1 0 0 
2 Yes No No 2.8 1.1 0.5 0.2 0.1 0 
3 No Yes No 3.8 1.2 0.3 0.2 0.1 0 
4 Yes Yes No 4.2 1.5 0.7 0.3 0.1 0 
5 No Yes Yes 4.1 1.1 0.4 0.2 0.1 0 
6 Yes Yes Yes 4.0 2.0 0.9 0.3 0.1 0.1 
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Run 2—Energy contour used, but no VQ. 

Run 3—Energy contour not used and VQ used only on the reference 
pattern—i.e., this is a single-SPLIT VQ. 

Run 4—Energy contour used combined with VQ only on the refer- 
ence pattern. 

Run 5—Energy contour not used and VQ used on both test and 
reference patterns—i.e., this is a double-SPLIT VQ. 

Run 6—Energy contour used combined with VQ on both test and 
reference patterns. 
For each of the energy-based runs, the parameters of the energy 
distance were set to 


Exo = 6(dB), Ey = 20(dB), Eor = 0(dB), 


and the LPC distance-clipping threshold, Dcup, was set to 2.5. (Some 
experimentation was done with values of Eor = 6 dB, as used in Ref. 
12, but results were almost always worse using this parameter setting 
because of the sensitivity of the DTW path to matching energy 
contours with the 6-dB energy offset). 

An examination of the results given in Table Ia, which gives digit 
error rates in percent for the top 6 word candidates (6 = 1 to 6) shows 
the following: 

1. For each of the three consecutive pairs of runs (where each pair 
differs only in regard to the inclusion of the temporal energy contour), 
the inclusion of energy reduces the error rate in the top candidate by 
about 0.6 percent (+0.1 percent). 

2. Applying VQ to the reference alone (Runs 3 and 4) increases the 
average digit error rate in the top candidate by about 0.4 percent. 
However, using energy, the error rate in the top candidate (2.4 percent) 
is still below the error rate for Run 1, the standard DTW recognizer 
without energy or VQ. 

3. Applying VQ to both test and reference patterns (Runs 5 and 6) 
increases the average digit error rate in the top candidate by about 1.3 
percent over that for Runs 1 and 2. In these cases the performance is 
degraded from that of the recognizer without either temporal energy 
or VQ. 


3.2 Results on digits (TS2) 


The results for the same six runs using the 2000 digits of TS2 are 
given in Table Ib. For this set of data the average digit error rate for 
Run 1 is about 1 percent higher than for TS1 data. This is due to the 
inclusion of talkers in the database with higher than average error 
rates. When energy is included in the recognizer (without VQ), the 
average top candidate error rate falls by 0.8 percent. 

The results of Runs 3 and 4 show that using VQ on the reference 
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patterns alone leads to a small increase in error rate (0.2 percent) for 
the case without energy and a larger increase in error rate (1.4 percent) 
for the case with energy. In these two cases, the runs using the energy 
contour provided essentially the same performance as the run without 
the energy contour. 

Runs 5 and 6 show a slight increase in error rate for the case without 
energy (Run 5 compared to Run 3) and a slight decrease in error rate 
for the case with energy (Run 6 compared to Run 4). For these two 
cases, the performance is essentially identical. 

A summary set of curves showing the error rate versus candidate 
position, 6, for the best sets of results of Table I is given in Fig. 3, 
where, for each consecutive pair of runs, we have plotted the best 
results. These sets of curves show the slight degradations introduced 
by applying a VQ to the reference alone and to both the reference and 
test patterns. 


3.3 Results on airlines words (TS3) 


For the airlines vocabulary a set of four runs were made. These runs 
correspond to the first four runs on the digits vocabulary. No tests 
were made with VQ of both test and reference patterns for this 
vocabulary. The results of the four runs are given in Table II and 
plotted in Fig. 4. 

The results show that using energy contours for this medium-size, 
complex vocabulary led to essentially no significant improvement in 
performance for either of the pairs of runs. For the case of no VQ of 
the references, the performance with energy was 0.2 percent worse 
than without energy; for the case of using VQ on the references, the . 
performance with energy was 0.5 percent better than without energy. 

It can also be seen that using VQ of the references led to a 4- to 4.5- 
percent increase in error rate for the top candidate and somewhat 
smaller increases for higher-position candidates. These results indicate 
that a VQ with 128 code-book entries is just too small for a vocabulary 
of this size and complexity. 


IV. DISCUSSION OF RESULTS 


The results presented in Section III showed the following: 

1. The addition of the temporal energy contour as an additive 
feature to the LPC vector generally improved the performance of the 
recognition system by a small amount. This result was more the case 
for the digits vocabulary than for the airlines vocabulary. 

2. For the digits vocabulary, using VQ on just the reference pattern 
(the single-SPLIT case) slightly increased the error rate; for the 
airlines vocabulary a significant increase in error rate occurred, indi- 
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Fig. 3—Average word error rate versus candidate position for the digits vocabulary 
for (a) TS1 and (b) TS2. 


Table I|—Average word error rates (percent) for tests on airline 


Run 
Number 


OWN 


Energy 
Used 


No 
Yes 
No 
Yes 





vocabulary 
VQ Error Rate (%) for Top 8 Candidates 
Ref. 1 2 3 4 5 
No 10.0 4.5 2.9 2.2 1.6 
No 10.2 4.2 2.4 1.9 1.6 
Yes 14.5 6.0 3.5 2.6 2.1 
Yes 14.0 6.3 4.0 3.0 2.2 
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O RUN 2 (NO VQ, E) 
O RUN 3 (VOrer, NO E) 
4 RUN 4 (VOger, ©) 


AVERAGE WORD ERROR RATE IN PERCENT 





CANDIDATE POSITION, 8 


Fig. 4—Average word error rate versus candidate position for the airlines vocabulary. 


cating that the size of the VQ was too small for the size and complexity 
of the vocabulary. 

3. For the digits vocabulary, using VQ on both the test and reference 

patterns (the double-SPLIT case) increased the error rate to a larger 
degree. 
These results indicate that using VQ on just the reference pattern, 
combined with using the energy contour as an additional feature, can 
lead to a recognition system with only marginally poorer performance 
than the system without VQ (at least for the digits vocabulary), and 
we assert that if a large enough VQ were used for the airlines vocab- 
ulary (e.g., M* on the order of 512), similar results would have been 
attained. 
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The basic results are consistent with the findings of Shikano,® who 
studied a speaker-trained system with a large vocabulary (641 words) 
of Japanese city names, and with earlier work on the digits vocabu- 
lary.” 

If we compare the results reported here to alternative computation- 
ally efficient approaches—such as the method proposed by Shore and 
Burton,’ or the Hidden Markov Model (HMM) approach’?—we see 
that the performance of the DTW approach using a VQ and temporal 
energy is significantly better than the alternatives. Shore and Burton 
presented results on a 20-word vocabulary (consisting of the digits and 
ten control words) for an eight-male talker population and had a 12- 
percent word error rate; for the digits vocabulary the error rate was 
still 4.1 percent. Rabiner et al.’* reported error rates of about 3.5 
percent for TS1 data using the HMM approach on a digits vocabulary, 
and about 15 percent on the airlines vocabulary. Thus, at the current 
time, since the computation of the LPC recognizer with DTW proc- 
essing using energy and VQ is comparable to that of alternative 
approaches, and since its performance is better, it is the most attractive 
proposal for significant reductions in computation in an isolated word 
recognizer. 

A key issue when using a VQ in the recognizer is the savings in 
computation over the conventional DTW approach without VQ. To 
quantify this concept, we define the following terms: 


M* = Number of vectors in code book 
V = Number of vocabulary words 
Q = Number of reference templates per vocabulary word 
L = Average number of frames in a word 


p = Order of LPC analysis. 


For the conventional DTW approach, the computation in each DTW 
(for each reference pattern) is approximately: 


Corw = Coisr + Ccomp (11a) 


=: L? 
Cpist = 37 (p + 1)(*, +), (11b) 


where Cpjsr is the computation for distances in the DTW, and Ccomp 
is the computation for combinatorics. In a serial processor the com- 
putation for combinatorics is on the order of one-fifth the computation 
for distances. Hence, a good approximation is 


a 
Cee 2 (p + 1)(, +) (12) 
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per DTW, or a total of 
Corw = Corw: V-Q 
== Pp + YV-Qs, +) as) 


to recognize a test word. 
Using the VQ in the recognizer leads to a front-end computation 
load of 


Cyvq = M*L(p + 1)(+, +) (14) 


to code the L frames of the test, and a reduction of computation in 
the DTW to 


72 


L 
Corwyve = 5 (p + 1)V-Q(«, +) (15) 


since all distance computation is eliminated. 
The ratio of computation, R, of the DTW with VQ to the DTW 
without VQ is given as 


R= Corwva + Cve 


16 
om ey) 
a. V-Q + M 
= a ceria . (16b) 
5 L-V-Q 


For the digits vocabulary, with L = 40, M* = 128, p = 8, V = 10, 
and Q = 12, we get 


(128 + 320) 


1990 = 0.233, 


Roicrrs = 
i.e., a 4.3 to 1 reduction in computation. For the airlines vocabulary, 
we get about a 5.5 to 1 reduction in computation (even if we make 
M* = 512). Hence, we conclude that the inclusion of VQ in the DTW- 
based word recognizer can indeed significantly reduce the computation 
without significantly lowering recognizer performance. 


V. SUMMARY 


In this paper we show that by adding a vector quantization stage to 
the standard DT W-based isolated word recognizer, and by incorporat- 
ing temporal energy as an additional feature to the LPC vector, a 
high-performance, yet significantly reduced computation word recog- 
nizer can be implemented. By using the so-called single-SPLIT meth- 
ods, in which the VQ is only directly applied to the reference patterns, 
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we show that the resulting VQ distortion can be made sufficiently 
small such that only an insignificant increase in word error rate results. 
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The Overload Performance of Engineered 
Networks With Nonhierarchical and 
Hierarchical Routing 
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We report the results of a study of the performance of engineered nonhier- 
archical and hierarchical routing networks under overloads. This study was 
motivated by results obtained from mathematical models for small, symmetric, 
uniformly loaded, nonhierarchical networks with transparent switching sys- 
tems, showing the existence of network instabilities. We extend the mathe- 
matical models to more general nonhierarchical networks, and show with 
analysis and an extant simulation model that such instabilities are also found 
in nonsymmetric, nonhierarchical networks. We then use our models to 
consider whether engineered nonhierarchical networks exhibit such unstable 
behavior. No instabilities are found in the engineered nonhierarchical net- 
works considered here. However, the nonhierarchical networks consistently 
demonstrate a drop in carried load between 10- and 15-percent overloads. Our 
analysis of comparably engineered hierarchical networks shows that these 
networks do not exhibit a drop in carried load under overloads (in the absence 
of switching system dynamics). Finally, we show that using trunk reservation 
for first-routed traffic allows the formulation of a control strategy that provides 
a high level of network carried load during overloads. 


I. INTRODUCTION 


Hierarchical routing has been used since the early days of the toll 
network. Before the early 1960s, hierarchical routing was very advan- 
tageous for a number of reasons. First, it allowed switching systems 
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to determine the path for a call very simply and quickly, using only 
the call’s destination. Furthermore, the call did not loop back to a 
‘switching system previously traversed. Second, hierarchical routing 
combined small traffic parcels on efficient final trunk groups. Third, 
hierarchical routing networks were relatively easy to engineer.! 

Recent advances in switching and signaling technology, as well as 
the tremendous growth in toll traffic, have provided new incentives to 
increase network design efficiency. Since the mid-1970s, efforts have 
been under way at AT&T Bell Laboratories to develop a new method 
of engineering large-scale nonhierarchical networks. These efforts 
culminated in the unified algorithm,” which takes advantage of traffic 
noncoincidence and routes calls over least-cost paths. This results in 
nonhierarchical networks, which are less expensive than hierarchical 
networks. In the nonhierarchical networks, a call can theoretically use 
any path connecting its origination and destination (although in the 
unified algorithm only 1- and 2-link paths are allowed). A call blocked 
at an intermediate switching system is cranked back to its origin so it 
can take the next path in its route. In addition, the unified algorithm 
allows routing to take advantage of time-sensitive load variations.® 

The nonhierarchical networks give service comparable to that of 
hierarchical networks under engineered traffic conditions. However, 
it was necessary to promote a better understanding of the performance 
of nonhierarchical networks under other traffic conditions. In partic- 
ular, there was concern that network instabilities that existed in small 
symmetric nonhierarchical networks might also exist in nonhierarch- 
ical networks engineered using the unified algorithm. 

To help provide this understanding, we studied the performance of 
engineered nonhierarchical and hierarchical networks under general 
network overloads, using mathematical and simulation models. We 
also investigated the effect on these networks of one control, namely, 
trunk reservation for first-routed traffic. 


Il. BACKGROUND 


The concern about the stability of nonhierarchical networks was 
stimulated by the work of Krupp,* and Nakagome and Mori.’ They 
carried out approximate analyses of nonhierarchical routing applied 
to small, uniformly loaded networks, which are easily analyzed because 
of their simple symmetric designs. Krupp also considered a nonsym- 
metric 3-node model. Their models did not include switching system 
dynamics. Their analyses revealed, in some cases, the existence of 
network instabilities when trunk-group (or network) blocking proba- 
bility is considered as a function of offered load. That is, for certain 
loads the network has two realizable states: (1) a low network blocking 
state, in which almost all calls use their shorter first-choice path; and 
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(2) a congested state, in which a large proportion of calls use longer 
alternate paths, and many calls are blocked. 

For example, Fig. 1 shows the relation between the offered load (per 
point-to-point pair) and trunk-group blocking obtained for a 10-node 
symmetric network with 100 trunks per trunk group, using their 
analyses. Each point-to-point pair has a direct first-choice path and 
eight 2-link alternate paths. The curve shows a range of offered loads 
that correspond to multiple blocking probabilities, indicating the po- 
tential for unstable behavior in this load range. To understand the 
instability more clearly, we used a simulation model developed by 
Krupp‘ to simulate the 10-node network. 

We first observed the number of calls carried by the network as 
offered load increased. Figure 2 shows the results of a simulation of 
the network with an initial load of 84 erlangs (starting with an empty 
network). The load was increased to 85 erlangs after 65 holding times. 
These loads are near the top of the multiple-valued region in Fig. 1. 
As Fig. 2 illustrates, a drastic change in carried load occurs after the 
offered load is increased. With the initial load, there is a fairly constant 
throughput of approximately 3750 calls. The trunk-group blocking is 
low, and few calls are alternate-routed. After the load change, the 
number of alternate-routed calls increases—first only slightly, and 


100 TRUNKS PER TRUNK GROUP 
8 ALTERNATE PATHS PER NODE PAIR 


POINT-TO-POINT LOAD 





0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 
TRUNK GROUP BLOCKING 


Fig. 1—Performance of a 10-node symmetric network. 
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Fig. 2—Simulation of a 10-node symmetric network with load = 84 erlangs/pair; load 
increased to 85 erlangs/pair. 


then dramatically. The average number of calls carried falls to 3000 
and half the carried calls are alternate-routed, indicating a high trunk- 
group blocking. This agrees very closely with the predicted load at 
which the transition from low to high blocking should occur as offered 
load increases. 

Another consequence of the mathematical solution is illustrated in 
Fig. 3, which shows the results of a simulation run for the same 
network with an initial point-to-point offered load of 90 erlangs, 
starting with an empty network. After approximately 14 holding times, 
the load is dropped to 80 erlangs. This simulation was conducted to 
determine the behavior of the network if a load in the multiple-valued 
region is offered while the network is congested. The initial load of 90 
erlangs was used to congest the network, with a resulting carried load 
of only 3000 calls. When the load is dropped to 80 erlangs, congestion 
persists for another 16 holding times before the carried load increases 
to 3600 calls, corresponding to operation in an uncongested state. 

These results raised interest in the performance of nonhierarchical 
routing in more general networks. To address this question, we devel- 
oped a mathematical model that extends the models in Refs. 4 and 5 
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Fig. 3—Simulation of a 10-node symmetric network with load = 90 erlangs/pair; load 
dropped to 80 erlangs/pair. 


to more general nonhierarchical networks. This model, described in 
the Appendix, analyzes network performance both without controls 
and with trunk reservation. We also used a simulation model for 
nonhierarchical networks developed by Krupp. The mathematical 
model is solved iteratively, starting with an initial estimate of the 
trunk-group blocking probabilities, to determine the trunk-group of- 
fered loads and blocking probabilities in equilibrium. The presence of 
network instabilities is demonstrated by the existence of multiple 
solutions for the same parameters (loads, trunk-group sizes, routing) 
obtained by using different initial estimates of the trunk-group block- 
ing probabilities. 

To establish the existence of network instabilities in more general 
nonhierarchical networks, we applied our models to an 8-node, non- 
symmetric, nonuniformly loaded, nonhierarchical network. The num- 
ber of trunks in each trunk group ranged fairly uniformly from 50 to 
995 trunks. It should be noted that this was not an engineered network. 
Point-to-point loads were chosen so that each point-to-point pair 
would experience a blocking no greater than 0.005 on the direct trunk 
group in the absence of alternate-routing, and an overload means 
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additional load above these loads. Each point-to-point pair was given 
a direct first-choice path and four 2-link alternate paths. 

We considered the performance of this network under general 
overloads of up to 10 percent, as shown in Fig. 4. The function relating 
percent overload to network blocking is double-valued for overloads 
of up to 4 percent, indicating the existence of instabilities for loads in 
this range. (Solutions corresponding to low network blocking were 
obtained by starting with low trunk-group blocking estimates, while 
solutions corresponding to high network blocking were obtained by 
starting with high trunk-group blocking estimates.) These results were 
substantiated by simulation. For example, Fig. 5 shows a simulation 
at 1-percent overload, starting with an empty network. Initially, the 
network experiences low blocking, with carried load at about 13,500. 
Then, after about 17 holding times, the network enters the congested 
state; carried load drops to 11,000, and a large proportion of calls are 
alternate-routed. _ 

Now that we had developed a methodology, and demonstrated that 
instabilities occur in more general nonhierarchical networks, we were 
ready to address the following question: Does the type of instability 
seen in these small nonhierarchical networks occur in engineered, 
nonhierarchical networks? To answer this question, we applied our 
models to three representative network models. These models are 
described in Section III. 


0.4 


0.3 


O MATHEMATICAL MODEL 


0.1 


NETWORK BLOCKING 





0 2 4 6 8 10 
PERCENT OVERLOAD 
Fig. 4—Network blocking for an 8-node nonhierarchical network. 
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Fig. 5—Simulation of an 8-node network at 1-percent overload. 


Ill. NETWORK MODELS 


Three network models were used in our study. The 30-node network 
model was developed by Gechter and Modarressi at AT&T Bell Lab- 
oratories to represent a 4ESS* network. It consists of the 10 regional 
centers and 20 of the sectional centers in the then existing hierarchical 
network, and is based on 1977-1978 Trunk Servicing System data. 
Other characteristics of the network include a wide geographic disper- 
sion of the switching systems, large point-to-point traffic parcels, and 
fairly uniformly distributed point-to-point loads. 

The 25-node network model is made up of a subset of the switching 
systems that comprise the 215-node hybrid network model recently 
developed at AT&T Bell Laboratories. The 215-node network was 
engineered using loads for October 1989, projected from 1978 Central- 
ized Message Data System data. This network includes the 140 4ESSs 
planned for deployment, as well as 75 stored-program-controlled toll 
tandem switching systems. 


* Trademark of AT&T Technologies, Inc. 
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The 140-node network model consists of the 140 4ESSs in the 215- 
node network. 

Versions of the two smaller network models engineered for both 
nonhierarchical and hierarchical routing were studied, while the 140- 
node network model was only considered with nonhierarchical routing. 
The 25- and 140-node networks were designed using the unified 
algorithm.” The 30-node nonhierarchical network was engineered us- 
ing another nonhierarchécal network design developed at AT&T Bell 
Laboratories. An average of 7-percent reserve capacity was added to 
the trunk groups of the nonhierarchical networks. For the 30- and 25- 
node network models, the nonhierarchical network has about 9 percent 
fewer trunks than its corresponding hierarchical network. The non- 
hierarchical 25-node network was the initial choice for the 1986 Phase 
One deployment of dynamic nonhierarchical routing (DNHR). The 
140-node nonhierarchical network represented the 1989 DNHR net- 
work envisioned in the predivestiture environment. 

Comparisons of the trunk-group sizes (number of trunks per trunk 
group) and route sizes (number of paths per route) for the 25-, 30-, 
and 140-node nonhierarchical networks are shown in Tables I and II. 
The 25- and 30-node nonhierarchical networks bound the 140-node 
nonhierarchical network both in terms of trunk-group sizes and route 
sizes. 


IV. RESULTS WITHOUT CONTROLS 
4.1 Introduction 

We applied the mathematical and simulation models to the 30-, 
25-, and 140-node networks. Network performance was investigated 
under both uniform and nonuniform general overloads. All three 
network models were studied under uniform general overloads, which 
were obtained by multiplying the engineered loads by a constant factor. 


Table |—Trunk-group size distributions for 
the nonhierarchical networks 


Proportion of Trunk-Group Sizes < = x 
25-Node 30-Node 140-Node 


x Network Network Network 
100 0.346 0.928 0.746 
200 0.580 0.986 0.868 
300 0.708 1.000 0.913 
400 0.786 0.938 
500 0.847 0.951 
600 0.871 0.960 
700 0.902 0.967 
800 0.932 0.972 
900 0.939 0.977 

1000 0.949 0.980 
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Table Il—Route size distributions for the 
nonhierarchical networks 


Proportion of Route Sizes < = x 
25-Node 30-Node 140-Node 


x Network Network - Network 
1 0.213 0.012 0.028 
2 0.385 0.021 0.100 
3 0.584 0.045 0.242 
4 0.791 0.101 0.407 
5 0.929 0.196 0.565 
6 0.976 0.297 0.718 
7 0.993 0.377 0.816 
8 1.000 0.441 0.883 
9 0.524 0.926 
10 1.000 0.971 


In addition, the 30-node network was studied under nonuniform 
general overloads, which were thought to emulate a more realistic 
overload situation. To represent the nonuniform overloads, we used a 
set of high-day loads for the 30-node network. The high-day loads 
were generated using the distribution derived for the ratio of the high- 
day loads to the average-business-day (engineered) loads and a Gamma 
distribution to achieve the proper ratio of the peak-day loads to the 
average of the ten-high-day loads. The high-day loads, which are on 
average 5 percent higher than the engineered loads, were used to 
represent a 5-percent overload. Other overloads were obtained by 
multiplying these loads by the appropriate factor. Uniform and non- 
uniform general overloads of up to 200 percent were considered. As 
we discuss below, the mathematical and simulation models give excel- 
lent agreement, indicating that the assumptions in the mathematical 
model do not distort the fundamental network behavior. 


4.2 The 30-node network 


Figure 6 presents the results for the 30-node networks under uniform 
overloads. The symbols in the figures indicate results obtained from 
the mathematical and simulation models. For a given set of loads, the 
mathematical model always converged to the same solution, regardless 
of the initial trunk-group blocking estimates, indicating that no net- 
work instabilities exist. This is demonstrated by the single-valued 
function in Fig. 6. However, both the mathematical and simulation 
results show a striking difference in the performance of the nonhier- 
archical and hierarchical networks. The two networks show similar 
performance up to about a 10-percent overload, with carried load 
increasing with increasing offered load. At that point the number of 
calls carried in the nonhierarchical network falls sharply, because of 
an increase in the number of multilink calls. The drop continues until 
around 100-percent overload, where the carried load begins to increase 
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Fig. 6—Performance of the 30-node network under uniform overloads. 


gradually. This increase results because the network has become 
congested to the point that the probability of finding available trunks 
for a multilink call is very small, so that now 1-link calls begin to be 
favored over multilink calls. Such behavior is not seen in the hierar- 
chical network, because, intrinsically, it can better limit the number 
of multilink calls under overloads. This is due to the presence of final 
trunk groups and of primary high-usage trunk groups that do not carry 
alternate-routed traffic, the absence of cranking back, and more trunks 
than are in the nonhierarchical network. The number of calls carried 
in the hierarchical network increases steadily as offered load increases 
over the entire range of overloads considered. 

Figure 7 demonstrates this point. Here we have plotted the ratio of 
the number of multilink calls to the number of 1-link calls as deter- 
mined from the mathematical model for various overloads. This ratio 
grows sharply for the nonhierarchical network for overloads of up to 
50 percent, then levels off and begins to decline. On the other hand, 
for the hierarchical network, this ratio rises slowly before leveling off 
at around 30-percent overload. These results are consistent with a 
simulation study by Weber using 3-, 4-, 5-, and 6-node networks, which 
showed that hierarchical networks perform more efficiently under 
overloads than nonhierarchical networks.® 

Figure 8 displays the performance of the 30-node network under the 
nonuniform overloads. Qualitatively, the results are the same as those 
obtained under uniform overloads. In fact, the difference in the number 
of calls carried under uniform and nonuniform overloads is very small 
at the higher overloads. The nonhierarchical network under nonuni- 
form overloads also exhibits a drop in carried load at about 10-percent 
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Fig. 7—Ratio of multilink calls to 1-link calls for the 30-node network. 
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Fig. 8—Performance of the 30-node network under nonuniform overloads. 


overload, though not as severe. The drop is attenuated because, with 
the nonuniform overloads, carried load cannot increase to the maxi- 
mum level seen with the uniform overloads. 


4.3 The 25-node network 
Figure 9 presents the results for the 25-node network under uniform 
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Fig. 9—Performance of the 25-node network under uniform loads. 


overloads. As with the 30-node network, the 25-node nonhierarchical 
network shows a drop in carried load: at about 10-percent overload in 
the mathematical model and at about 15-percent overload in the 
simulations. However, the drop is not as sharp as that seen in the 30- 
node nonhierarchical network. We attribute this to the smaller number 
of paths per route in the 25-node nonhierarchical network (see Table 
II), which limits the potential for a large number of multilink calls. 
The hierarchical network exhibits a continuous increase in carried 
load with increasing offered load. 


4.4 The 140-node network 


Figure 10 displays the performance of the 140-node nonhierarchical 
network under uniform overloads. These results were obtained from 
the mathematical model. No simulations of this network were made 
because of its large size. The results agree qualitatively with those 
derived for the 30- and 25-node nonhierarchical networks. Again, 
carried load declines at around 10-percent overload and increases at 
the larger overloads. Figure 10 also shows the number of 1-link calls 
in the network as derived from the mathematical model. The direction 
of change in the number of 1-link calls is almost always the same as 
the direction of change in the total number of calls. Again we conclude 
that the degree to which network capacity is efficiently used is related 
to the network’s ability to favor 1-link calls over multilink calls. 


4.5 Comparison of the nonhierarchical networks 


Our analysis of the engineered nonhierarchical networks without 
controls shows that the type of instability discussed in Section II does 
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Fig. 10—Performance of the 140-node network under uniform overloads. 


not occur—the offered load versus carried load functions are all single- 
valued. We have seen, however, that the nonhierarchical networks 
without controls demonstrate a drop in carried load under overload 
but with different degrees of severity. The most severe drop occurs in 
the 30-node network, where carried load drops 8.37 percent below the 
level at 10-percent overload before leveling off (based on the mathe- 
matical model). The 25- and 140-node networks demonstrate respec- 
tive drops of 0.58 and 1.13 percent. Krupp has shown that, for 
symmetric nonhierarchical networks, all other things being equal, 
performance under heavy loads worsens as the number of trunks per 
trunk group or the number of paths per route increases (see also 
Weber®). Thus, we might expect the difference in the severity of the 
poor performance in our networks to be related to these variables. 
From Table I, which shows the distribution of trunk group sizes for 
the three nonhierarchical networks, it is clear that network overload 
performance does not worsen as trunk group size increases. In fact, 
the 30-node network, which shows the worst overload performance, 
has an average of only 40.4 trunks per trunk group, while the 25- and 
140-node networks have, respectively, 320.7 and 132.6 trunks per trunk 
group. 

However, this should not be interpreted as a contradiction of 
Krupp’s results, since there are other significant differences in the 
networks. In particular, the networks vary widely in their route sizes 
(number of paths per route), as shown in Table II. The 30-node 
network provides on average 6.91 alternate paths per route, the 25- 
node network provides 2.12, and the 140-node network provides 4.41. 
Thus, we see a strong relationship between route size and poor overload 
performance and, we can conclude that route size is a better predictor 
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of network overload performance than trunk group size. This further 
substantiates the conclusion that the reason for the poor overload 
performance of the nonhierarchical networks is the large amount of 
alternate routing under overloads made possible by the large number 
of alternate paths. 


V. RESULTS WITH TRUNK RESERVATION FOR FIRST-ROUTED TRAFFIC 


As we know from the study of hierarchical networks, appropriate 
network controls are an effective way to mitigate poor network per- 
formance under nonengineered conditions. Since the poor network 
performance seen above is related to inefficient use of trunks for 
alternate routing, trunk reservation for first-routed traffic appeared 
to be an effective way to control our test networks. On each trunk 
group we reserved 5 percent of the trunks, with a minimum of one 
trunk, for first-routed traffic. Only the effect on network carried load 
was considered. Other variants of the trunk-reservation control, as 
well as the impact on point-to-point blockings, have been investigated 
but are not discussed here. 

The effects of trunk reservation on the 30-node networks are typical 
of all the networks considered. The results under uniform overloads 
are shown in Fig. 11. By comparing Figs. 6 and 11, we see that trunk 
reservation has a very beneficial effect with nonhierarchical routing 
under large overloads. The drop in carried load with increasing offered 
load without controls disappears with the use of trunk reservation. 
Trunk reservation limits the number of multilink calls, allowing more 
efficient use of the trunks. Similarly, the performance of the hierar- 
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Fig. 12—Performance of the 30-node network under nonuniform overloads with trunk 
reservation. 


chical network under large overloads is improved, although the im- 
provement is not as dramatic as for the nonhierarchical network. At 
overloads of less than 10 percent, though, two negative effects appear: 
(1) carried load actually decreases slightly since some alternate-routed 
calls are prevented from accessing paths that have been engineered to 
carry them, and (2) some point-to-point blockings are increased, 
distorting servicing measurements used to schedule trunking augmen- 
tations. These effects can be eliminated by using a triggering mecha- 
nism to allow the trunk-reservation control only when overloads are 
large enough to preclude these effects.” 

Figure 12 shows the trunk reservation results for the 30-node 
networks under the nonuniform overloads. The effect of the control 
under large overloads for both the nonhierarchical and hierarchical 
networks is essentially the same as that seen under uniform overloads. 
One contrast to the uniform overload case is that now trunk reserva- 
tion is usually beneficial even at the lighter overloads. Reduction in 
carried load with trunk reservation is extremely small. 


VI. SUMMARY 


Analysis of small, symmetric, uniformly loaded nonhierarchical 
networks has shown the existence of network instabilities for certain 
networks when only trunking behavior is considered. We have dem- 
onstrated by example that such instabilities persist even when the 
assumptions of symmetry and uniformly distributed loads are re- 
moved. Using a mathematical model that we developed, together with 
a simulation model, we have studied the performance of three realistic 
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network models under overloads. The 25- and 140-node nonhierarch- 
ical networks were designed using the unified algorithm. The 30-node 
nonhierarchical network was engineered using another design algo- 
rithm. In addition, two of the models were studied with hierarchical 
routing. Results obtained demonstrate that, without controls, the 
performance of nonhierarchical networks is inferior to that of hierar- 
chical networks under overloads. No instabilities of the type described 
in Refs. 4 and 5 are seen in any of the engineered nonhierarchical 
networks, but a drop in carried load consistently occurs at about 10- 
percent overload because of the tendency of the nonhierarchical net- 
works to alternate-route calls when overloaded. The severity of this 
drop appears to be correlated with the number of paths per route; 
nonhierarchical networks with more paths per route exhibit greater 
throughput degradation under overloads. Such behavior is not seen in 
the hierarchical networks, which intrinsically can better limit the 
amount of alternate routing under overloads. An important assump- 
tion in our models is the absence of switching system effects. Other 
studies have shown that, when these effects are included, the perform- 
ance of networks under large overloads changes significantly from that 
given by our analysis. However, our results indicate that engineered 
nonhierarchical networks without controls exhibit unsatisfactory per- 
formance even at overloads of 10 to 15 percent, where switching system 
dynamics are not likely to be an important factor. 

We also have applied a control, namely, trunk reservation, for first- 
routed traffic, to the network models. This control improves the 
performance of the nonhierarchical and hierarchical networks. By 
diminishing the amount of alternate routing in the networks under 
large overloads, trunk reservation permits 1-link calls to use the trunks 
more efficiently. For the nonhierarchical networks, this results in a 
continuous increase in carried load with increasing offered load over 
the entire range of overloads considered. However, at very light over- 
loads, carried load may drop when trunk reservation is activated. This 
suggests the use of a triggering mechanism to impose trunk reservation 
only at larger overloads. 
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APPENDIX 
Mathematical Models 
A.1 Models without controls 


The model for the nonhierarchical networks without controls re- 
quires that a network be specified, together with trunk-group sizes, 
point-to-point offered loads, and a fixed route for each point-to-point 
pair. Jt is assumed that (1) the mixture of traffic offered to a trunk 
group is Poisson, (2) trunk-group blocking probabilities are independ- 
ent, and (3) the time needed to make connections (setup time) is small 
enough, relative to the average holding time of calls, to be ignored. 
The effect of the first two assumptions can be gauged by comparison 
with the simulation results, since these assumptions do not occur in 
the simulator. 

We permit the trunk-group sizes and the offered load and the 
number of paths for each point-to-point pair, as well as the number 
of trunk groups in a path, to be arbitrary. We also assume that a call 
blocked on a trunk group of a path can always be cranked back to the 
originating office so that the call can access the next path in its route. 

In the discussion that follows, the term path means a set of distinct 
trunk groups that form a connection between two nodes. A route is an 
ordered collection of paths connecting the same point-to-point pair, 
specifying the paths used for routing calls between the pair in the 
order that seizure of the paths is attempted. 

Before giving the details of the model, we introduce some notation. 
Let L/ be the offered load for point-to-point pair j and let L = ¥; L/ 
be the total offered load. Let p;, n;, and a; denote, respectively, the 
blocking probability, trunk-group size, and offered load for trunk group 
i (in erlangs), and let g; = 1 — p;. We denote a path by r, a route by R, 
the route for point-to-point pair j by R’, and the route formed by the 
first k paths of R’ by 


Ri, = (ri, ..., rh). 


Finally, we define D(R) to be the probability that route R is blocked, 
i.e., each path in R has at least one blocked trunk group. 

The basic idea of our analysis is to determine the offered load a; for 
trunk group 7 as a function of the trunk-group blockings. For each 
route containing trunk group i, we determine the contribution that 
the route makes to the total trunk-group offered load. Suppose trunk 
group i isin path /, the kth path in the route for point-to-point pair 
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j. The load carried by ,4,, ¢/,, is given by 
ci, = LD(Ri_,) — D(Ri)1, (1) 


which is the point-to-point load for pair j that overflows the first 
k — 1 paths but not the kth path. The load ci, contributes to the carried 
load for each trunk group i € r3,. The total carried load for trunk group 
i, K; is obtained by considering all paths containing trunk group i and 
is given by 


K; = » ci, 
dR. 
ir), 

From the relation 
K; = aq; 


for Poisson traffic, we immediately obtain 


a; = » ch/ qi. (2) 


iert 
er) 


In addition to the relations given by (2), based on our assumption 
of Poisson trunk-group offered loads, we relate the trunk-groups 
offered loads to the trunk-group blockings by the Erlang-B formula: 


p,; = B(n, a,). (3) 


The equations given by (2) and (3) can be solved iteratively, starting 
with an initial estimate of the trunk-group blockings, to determine the 
trunk-group offered loads and blocking probabilities in equilibrium. 
[The calculation of D(R) is discussed below.] Once a solution has been 
obtained, several quantities of interest can be calculated. In particular, 
network blocking, z, is given by 


5; L‘D(R’) 
L > 


and network carried load, C, by 
C = L(1 — 2). 


We now consider the calculation of D(R), R = (rn, ..., rz). If the 
paths of R are disjoint, then, by our independence assumption, the 
blocking probabilities for the paths are independent, so that 


D(R) = I ( an | a) (4) 


t=1 iEr, 
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For the nonhierarchical networks studied here, the paths have all been 
restricted to contain either one or two trunk groups. This implies that 
all paths in a route are disjoint, and D(R) can be calculated using the 
formula above. 

The model used for the hierarchical networks without controls was 
taken from Ref. 8. We used only the trunk-group portion of the model, 
omitting the parts dealing with switching system dynamics, retrials, 
and DABY (don’t answer, busy). In fact, this abbreviated model differs 
from the nonhierarchical model only in the way in which trunk-group 
carried load is calculated. In the nonhierarchical-network model, this 
calculation incorporates crankback, while in the hierarchical model, 
final trunk groups are taken into account. 


A.2 Models with trunk reservation for first-routed traffic 


Network performance was also modeled with trunk reservation for 
first-routed traffic. Under this control, a threshold is specified for each 
trunk group, and alternate-routed calls attempting to seize a trunk on 
the trunk group are refused if the number of busy trunks on the trunk 
group has reached the threshold. For the nonhierarchical networks, 
this control was implemented by subjecting a call to trunk reservation 
on all legs of an alternate path. In the hierarchical networks, the 
classification of a call as first-routed or alternate-routed was made at 
each switching system that the call traversed. Each call was classified 
as follows: On the first-choice trunk group out of a switching system 
the call is considered first-routed, whereas on any other trunk group 
it was considered alternate-routed. All alternate-routed calls offered 
to a trunk group were subjected to this control. A call overflowing a 
trunk group because of trunk reservation was offered to the next path 
in its route. When the first path of a route uses fewer trunk groups 
than the alternate paths (e.g., the first path is a direct path), then 
under large loads this control has the effect of decreasing the average 
number of trunks per call and thus increasing network carried load. 

For the trunk reservation model, let a be the total trunk-group 
offered load, p the trunk group blocking probability, and q = 1 — p. 
Also, let m be the trunk-reservation threshold on the trunk group, q 
the probability that no more than m — 1 trunks in the trunk group 
are busy, da the trunk-group offered load that is subject to the trunk- 
reservation control (i.e., the alternate-routed traffic), K the trunk- 
group carried load subjected to trunk reservation, and r = a/a. Using 
a birth-death model for the behavior of n servers with offered load a 
when less than m servers are busy and offered load a(1 — r) when at 
least m servers are busy, we obtain the probabilities P; that exactly j 
trunks on the trunk group are busy: 
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a? : 
EL yee J=0,...,m—-1, 


a! : ; 
Sng (en) RG j=m,...,N. 
]} 
Here 
m a® n a® ree -1 
n-[Set Sg -o] | 
It follows that 
ee a” n-mp 
p> Al (1-1) 0 (5) 
m-1 ,j 
G= DP. (6) 
j=0 J: 


The quantities p and g are easily calculated using recursive formulas. 

We also need formulas for the calculation of a and a, which we 
obtain by relating these quantities to the corresponding carried loads. 
The total carried load K is given by 


K= ) JP; 
jJ=0 


= arg + a(1 — r)q, 


so that the offered load that is not subject to trunk reservation is given 
by 


.. Ka-n 
org + (1 — nq’ 7) 
We relate the offered load 4 to the carried load K by 
K 
a=-—. (8) 
q 


Carried load K is still calculated as in the previous section with K 
being the portion of K that was subjected to trunk reservation. 

The nonhierarchical model with trunk reservation for first-routed 
traffic is obtained by replacing eq. (2) with eqs. (7) and (8) and 
replacing eq. (3) with eqs. (5) and (6). The calculation of D(R) in (4) 
is now given by 


DIR) =(1 if a) Il ( =i i) 


i€r, t=2 i€r, 
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The hierarchical model with trunk reservation for first-routed traffic 
is obtained by combining eqs. (5) through (8) with the Franks and 
Rishel formulas for trunk-group carried load. Separate calculations of 
total carried load and alternate-routed carried load are made for each 
trunk group. 
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This paper presents practical formulae and computer programs for the 
following traffic-related problems: determination of offered load; determina- 
tion of number of trunks; equivalent random method and Hayward’s method; 
the three-moment match (construction of interrupted Poisson stream); day- 
to-day load variation, including both time and call congestions; and the 
analysis of a multiserver queue in a traffic environment. 


I]. INTRODUCTION 


This paper presents methods of traffic calculations that have been 
worked out and modified by the author for the typical range of 
problems encountered in traffic applications. Their efficacy depends, 
in large part, on improved computations of the Erlang loss function 
and its derivatives. Practical computer programs are given in the 
Appendix for immediate application. These programs were originally 
written by the author in TI EXTENDED BASIC but were transcribed 
to Fortran by Brian Farrell. All of the programs except for the 
GT/M/S queue were also written in POCKET BASIC for execution 
on the TRS-80' PC1l. The programs are simple and are executed 
rapidly. 

This paper does not fully develop the theories of all of the formulae 
used. Instead, it gives enough description and explanation to make the 
development of the computational formulae understandable. Numer- 
ical examples show the operation of the methods and programs. These 
results are intended to serve only as checks on the computer program- 
ming; for theoretical accuracy, one should consult the references. 
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Traffic theory traditionally was concerned only with blocking phe- 
nomena. However, modern systems are often analyzed with respect to 
delays, which arise because of the complex implementation of the 
communication system and the increasing range of services being 
provided. For example, delays occurring in systems that yield a re- 
sponse to a query (reservation systems, credit card systems, etc.) are 
important enough to require careful analysis. In this paper a multi- 
server delay queue GT/M/S, whose arrival stream is typical of those 
occurring in traffic theory, is analyzed using the tools of that theory, 
which are developed here. The waiting time distribution and mean 
waiting time may be conveniently calculated using the program for 
the GT/M/S queue given in the Appendix. 


Il. INTERPOLATION 


The Erlang loss function arises in the study of the M/M/n/n 
queueing problem, that is, a Poisson stream of calls offering a erlangs 
to a fully available trunk group of n trunks; the Erlang loss function 
expresses the blocking probability, that is, the probability that an 
arriving call is rejected because no trunk is available. It usually is 
stated in the form! 


a” n a 
B(n, a) = — | = 1 
“i/ BG (1) 
or, equivalently, 
n ni ; 
B(n, a)? = ¥ = qo), (2) 
j=0 J: : 


In terms of the descending factorial, n“, defined by 
n® = 1, n =n(n—-1)--- (n-j+1)U 2 1), (3) 
one may also write 


B(n, a) = > na, (4) 


j=0 
The integral representation of Fortet” may be obtained from (4) as 
follows. From the Eulerian integral 


ia y? 
a’=a e” | dy, (5) 
0 y! 
one has 
0 n yi) 
B(n, a)? =a ev y — y’dy (6) 
0 j=0 J: 
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and, hence 


B(n, a) =a f e%(1 + y)"dy. (7) 


Since the above integral has meaning for nonintegral n, the general 
definition of B(x, a), in which x > 0 is unrestricted and a > 0 is (see 
Ref. 3) 


B(x, a)! =a i e(1 + y)*dy. (8) 

This relates B(x, a) to the incomplete gamma function,* namely, 
B(x, a)7! = a*e* ir e *y*dy, (9) 
= a*e T(x + 1, a). (10) 


The function B(x, a)~' satisfies an important linear difference equa- 
tion that will now be obtained. Integration by parts applied to (8) 
yields 


B(x, a)’ =1+ xf e (1 + y)* "dy. (11) 
0 
Hence, 
B(x, a) = ~ Bx Say 1. (12) 


This is an excellent recursion for the successive computation of 
B(x, a)~'. For integral values of x, the initial value B(0, a) = 1 is 
convenient. 

For use in the equivalent random method® and Hayward’s approxi- 
mation,® to be discussed later, it is important to have an easily 
calculable approximation for B(x, a) when x is nonintegral. For this 
purpose Newton’s interpolation formula will be used.’ Define the 
forward difference operator, A, by 


Af(x) = f(x + 1) — f(x). (13) 
Then, the powers A’, A®, --- , are defined by successive application of 
A, and thus 
A’ f(x) = f(x + 2) — 2f(x + 1) + f(x). (14) 
Newton’s interpolation formula is 
flx+h=¥ @ Alf(x). (15) 


TRAFFIC §=1285 


Let 
f(x) = In B(x, a), (16) 
n = [x], h=x-—n, (17) 


in which the brackets designate integral part. Newton’s formula is 
now applied up to the second difference to obtain 


In B(x, a) = In B(n, a) + hA In B(n, a) 
+ ; h(h — 1)A*In B(n, a). (18) 


For convenience in writing, the following abbreviations are used 
B= B(n, a), B, = B(n +1, a), B, = B(n + 2, a). (19) 
One now obtains, from (18), 
B(x, a) = B*B? fle (20) 
, BB, 
The accuracy of (20) improves with increasing x. The worst error 
occurs at h = 0.5. Some comparisons are given in Table I. 


Ili. DERIVATIVES 


For economic considerations and for iteration formulae for the 
solution of equations involving B(x, a), as exemplified later in this 
paper, the derivatives dB(x,a)/da=B, and dB(x,a)/dx=B, are 
needed.® The symbol B = B(x, a) will be used. 

From (8) differentiation with respect to a yields 


—B~B, = i e (1 + y)*dy — a ‘| ey(1 + y)*dy, = (21) 
0 0 
= { e™(1 + y)*dy — a i e™(1 + y)*"'dy 
0 0 


+ ei) e (1 + y)*dy, (22) 
0 


Table I—Comparison of interpolation with 
exact values 


x a B= B= 
1.5 0.1 0.02155 0.02132 
5.5 2 0.02146 0.02145 

10.5 8 0.10011 0.10013 
100.5. 90 0.02517 0.02517 
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( + 2) B" — B(x + 1, a), (23) 


(24a +1)e-(1+2)2 (24) 


a 


Ba 





B= (E-1+3)B (25) 

For B, one has, from (8), 
B, = —B’ a { e-(1 + y)*In(1 + y)dy. (26) 
Unfortunately, there is no exact evaluation of (26) in convenient form. 


Useful, easily calculable approximations may, however, be obtained. 
First, a crude but useful approximation will be obtained. Let 





f(y) = Bae™(1 + y)*. (27) 
Then, from (8), 
fly) = 0, i f(y)dy = 1. (28) 
Also, one has 
i yf(y)dy = B(x + 1)*B -1, (29) 
er ep i (30) 
a 


Jensen’s inequality? for a random variable (£) and a function g(x) 
convex on the range of (£) is 


Eg(é) > g(Ké). (31) 


Accordingly, let £ have the density function f(y). Then, (26) may be 
expressed as 





B,/B = —E \n(1 + &). (32) 
Since —In(1 + y) is convex on y = 0, using Jensen’s inequality gives 
B,/B > —In (: - =e 2), (33) 
in which E€é was obtained from (30). It will be convenient to set 
as (34) 
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so that one has 
B,/B = —In a. (35) 


To obtain a better approximation to B, than the lower bound of 
(33), the difference equation (12) will be used. Again, let 


f(x) = In B(x, a). 
Then, the difference equation becomes 
f(x + 1) -— f(x) = -In a. (36) 
The Taylor expansion for f(x + 1) — f(x) gives 


f(a) & In a — 5 f"(2). (37) 


7 
a 
From f’ (x) = —In a, one has f” = = hence, 


f(x) = —Inat. (38) 
2a 
One has 
fe) ==, (39) 
1 
a’ =—+ B,. (40) 
a 


Substituting these values of f’, a’ into (38) yields the following 

approximation for B,: 

_ In a — 1/(2aa) 
1 — B/(2a) 

With (35) designated as bound and (41) designated as approximation, 


Table II presents comparisons with exact values taken from the table 
of Akimaru and Nishimura. Throughout the table B = 0.01. 


B,/B = (41) 


IV. DETERMINATION OF OFFERED LOAD 


In the equation 


B(x, a) = P, (42) 
Table II—Comparisons of derivative values 
Approxima- 

x a ~—B,/B Bound tion 

5 1.3608 1.4025 1.4860 1.4044 
10 4.4612 0.8626 0.9065 0.8630 
20 12.0306 0.5406 0.5628 0.5406 
50 37.9014 0.2956 0.3042 0.2956 
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in which x and P are given, a is to be determined. Newton’s method 
of iteration is well suited to the problem. Let ap be an initial approxi- 
mation, and let a, be the refined result. Then, 
Bo -P 
(Ba)o / 


in which the subscript 0 indicates evaluation at a = ag. Using (25) one 
has 


a, = A — (43) 


Bo — P 
ae i eae ae REL. 
(2-142) 2, 
ap 


The problem of obtaining a good starting point remains. For this 
purpose an inequality for B is obtained. 
From 


(44) 


(1 + y)* < e” (45) 
and (8) one gets 





B(x, a) < - £ 7 x<a. (46) 


. Using (46) in (12) now yields 


x 


4 
at+1’ (47) 


B(x, a) =1- 





with no restriction on x. Thus, setting B = P in (47), the initial value 
can be 


-_ x 
~1-pP 


As an example of the convergence rate of (44) starting with (48), 
consider x = 20, P = 0.01. The following values were obtained using 
the program given in the Appendix. 


ay = 19.202, a, = 14.057, a2 = 12.568, 
a3 = 12.088, a4 = 12.031, a = 12.031. 


Another important case occurs when the carried load, L, is specified 
and the offered load is required; the relevant equation is 


L = a(1 — B(x, a)). (49) 


Newton’s formula in the form 





ao — 1. (48) 


ao(1 — Bo) — L 


sa Deameaaine 78 om) 
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is used. Since 


L,=1-—B-—aB,, (51) 
one has, using (25) and (5), 
= - a(1 — Bo) — L 
a 2 1- Bo = (x — a+ aoBo)Bo , (52) 
The inequality 
L 
a<L ( + =a (53) 
gives a convenient starting value for (52); thus, 
ee» 
«=L(1+24—), (54) 
The convergence rate is the same as in (44). 
V. DETERMINATION OF NUMBER OF TRUNKS 
The equation 
B(x, a) = P (55) 


will now be solved for x given a and P. The Newton iteration formula 
now reads 








7 Bo -—P 
X1 = Xo (Boo (56) 
The bound for B,, namely (33), will be used in (56); thus 
_ By — P 
X= X + Bo in Pe . (57) 


The starting value for x is again obtained from (47); it is 
Xo = (1 — P)(1 + a). (58) 


One must use (20) to evaluate B, since x need not be an integer. Using 
the approximate value (33) for B, does not impair the accuracy of the 
result. It merely slows down the convergence rate over what Newton’s 
method would provide with the exact derivative. 

Using the program of the appendix and a = 12.031, P = 0.01, the 
following values are obtained: 


Xo = 12.901, x1 = 16.319, X2 = 18.352, x3 = 19.505, 
_ X44 = 19.932, Xs = 19.997, x6 = 20.000. 
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t OVERFLOW 


¢ TRUNKS 


1 (m, z) 


Fig. 1—Common trunk group. 


VI. EQUIVALENT RANDOM METHOD AND HAYWARD’S METHOD 


Figure 1 schematizes the problem of ascertaining blocking on a 
common trunk group. The common trunk group is offered a composite 
stream, which is the superposition of overflow streams. 

The input stream is characterized by the offered load m and peaked- 
ness z. The equivalent random method considers the stream (m, z) to 
be the overflow stream of a fictitious trunk group of x trunks and 
Poisson-offered load a. This is shown in Fig. 2. 

If the stream (m, z) is not in fact the overflow of a single trunk 
group, then x need not be an integer. The Kosten formulae for overflow 
and peakedness are 


aB(x, a), (59) 


ee 
x+mt+1—-—a 


m 


z=l—-—m+ (60) 


These can be arranged in the following form: 


m= abla 2 +2 m—1,4], (61) 
m+z-1 

mt+z 
= @ ————_ —- m-—l. 2 
a ee a ve (62) 


The problem is to determine the equivalent random parameters x and 


| OVERFLOW 


c TRUNKS 
t (m, z) 
x TRUNKS 
f a (POISSON) 
Fig. 2—Common and fictitious trunk groups. . 
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_ a; that is, to solve (61) for a and then obtain x from (62). When x and 
a are known then, of course, the required blocking probability, P, is 
given by 


_ B(x +e, a) 
P= Bia (ea) (63) 
Newton’s method will be used to solve for a. Thus, 
ay = ay — SO, (64) 
da (aB)o 
in which B = B (a ete —m-—1, .). 
mt+z2-1 
One has 
< (aB) = B+ (x + m+ 1)B, + aB,, (65) 


and using (25) and (83), 
< (aB) = [xt m+ 1—a- (c+ m+ Vinal. (66) 


Newton’s eq. (64) now becomes 


oe [xo + m + a — do — (X%) + m+ 1)In ay] Bo 
The formula of Rapp,’° namely, 
Ap = mz + 32(z — 1), (68) 


will start the iteration. 

The following test case was used on the program in the Appendix. 
For x = 10, a = 8 one finds, from (59) and (60), that m = 0.97329, z = 
2.04016. The results of the run are 


Xo = 10.527, x, = 10.181, x2 = 10.067, 
a) = 8.352, a,= 8.121, a2 = 8.045, 
x3 = 10.025, x4 = 10.010, x5 = 10.004, 
az = 8.017, a,= 8.007, a; = 8.003. 


The equivalent random method is usually used only when the service 
distribution on the common group is exponential, since the Kosten 
formula was derived for that distribution. However, in this regard, see 
Ref. 11 for a discussion of the constant service time case. To consider 


1292) TECHNICAL JOURNAL, SEPTEMBER 1984 


blocking on a common group with other service distributions, the 
approximation of W. S. Hayward is used. (See Ref. 6 for example.) In 
Fig. 1 the approximation is 


PB (. ), (69) 


Zz 


To use (69), the peakedness, z, must be referred to the service distri- 
bution that is considered. This will now be discussed.’ For a service 
distribution, F(x), with service rate p, let 


Fo(x) = F(x/p), (70) 
that is, the distribution scaled to unit rate, and further 


Fo(x) = 1 — F(x), 


F§(y) = i F6(x)F6(x + y)dx. (71) 


The notation 2(F6; ») will be used to show the dependence of z on p 
as a function and on F as a functional. Since z is usually known 
relative to some distribution, it would be useful to be able to transform 
z to other distributions. The Mellin transform will accomplish this. 
For a function, f(x), defined on (0, %), the function f(s) defined by 


f(s) = if x "f(x)dx (72) 
is called the Mellin transform of f(x). Let 
f(u) = 2(Fo; vw) — 1, (73) 
&(u) = 2(Go; w) — 1. (74) 
Then, the required transformation formula for z is 
_, _ G(s) = 


For a renewal stream with renewal density m(7), the peakedness may 
be calculated directly by 


ee ih Sis ; + Qu i; F°®)(u) m(u)du. (76) 


As an example, consider the stream given by 
m(r) = X\ + Ae. (77) 


By (76), z relative to exponential service is 
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z(e*; wn) =1 + (78) 


atyp 

It is desired to transform this to the service distribution 
G(x) = 1 — (x + 1e™, (79) 

for which p = 1/2. One has 








F9%s) = 5 TO), (80) 
¢,(2) eee 3 et 
GEM%s) = 5 2-T(s) + 72° T(s + 1). (81) 
Hence, 
GES) Deh Buss 
Fey 42 +72 (82) 
Since 
F(s) = —— Aa, (83) 
sin 7s 
one now obtains from (75) and (82), 
epee Ae ce (84) 
as’ sin xs [8 \2 a0) das 
Hence, 
ee. ae! 12y 
2G; w) =1+ 5 |. oa |. (85) 


Since p = 1/2, one finally has 


A 10 6 
pa144) 0 8], (86) 


Consider the following numerical examples. Let the common group 
have exponential service distribution, and let c = 15, m = 10, z = 3. 
Then, the equivalent random parameters are x = 39.615, a = 46.721. 
Thus, one has 

_ B(S4.615, 46.721) 
~ B(39.615, 46.721) 


P,, = B(5, 3.333) = 0.139. (88) 


Let the service rate for this example be » = 1, and let the arrival 
stream be that defined by (77) with \ = 10, A = 4, a = 1. This is 


= 0.151, (87) 
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consistent with m = 10, z = 3. Now let the service distribution be that 
of (79). Then, by (86) 


z= 4.25 (89) 
and 
15 10 
P.=B fe , 3) = 0.187. | (90) 


VI. THE THREE-MOMENT MATCH 


The interrupted Poisson process is on (flows) for an exponential 
period of time whose mean duration is y~', and is off (stopped) for an 
exponential period whose mean duration is w !. This may be thought 
of as a Poisson process of rate \ entering a switch that is alternately 
closed and opened. The output of the switch is the interrupted Poisson 
process with a rate \’. This is shown in Fig. 3. 

A stream that is the overflow of a single trunk group with Poisson 
offered load will be called on 0-stream. The interrupted Poisson 
process provides a useful approximation to an 0-stream.” The tech- 
nique of approximation offers the 0-stream to an infinite server group 
and offers the interrupted Poisson stream to another infinite server 
group. In each group the distribution of the number of busy servers at 
any instant of time is obtained. The first three moments of these 
distributions are then equated. They give equations that define the 
interarrival time distribution of the interrupted Poisson stream. Since 
this stream is renewal, it is now completely defined. 

Let the 0-stream be the overflow from a trunk group with x trunks 
and offered load a, and let 


B= B(x, a), B, = B(x + 1, a), By = (x +2,a). (91) 


Then, the required equations for \, \’, and the switch parameters, y, 
w, are 


169° 
o=B =F Be Bo (92) 
2 60°61" 
bg = _) -— (93) 


a’ Bz’ — 2By* + B” 
SWITCH 
d dN 
PE ak Ao a CT 


POISSON INTERRUPTED POISSON 


Fig. 3—Generation of interrupted Poisson process. 
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q o2bs = 80) — do(d2 — 61) 








& 94 
‘ ee oe es) 
69 A — ao, 

ae 95 

mae eee (95) 

Fae (96) 
a 60 

ee ee (97) 
Y + @ 


The Laplace transform, f(s), of the interarrival time density function, 


f(x), is 


iy As + Aw 
= =, 98 
i(s) F+(A+ 7 + w)s + rw a) 
and the peakedness relative to exponential service is 
r 1 
2(u) =1+— (99) 


y¥twytowtp 


The equations for the three-moment match were set up in Ref. 13, 
taking » = 1. Thus, (99) yields the exact peakedness of the 0-stream 
for » = 1. A program for the evaluation of , \’, y, w, is given in the 
Appendix. 

The following numerical example was evaluated. For x = 10, a = 8, 
one finds \ = 5.4405, y = 2.7054, w = 0.5894, \’ = 0.9733, and z = 
2.0402. 

The interrupted Poisson process is also used to construct a stream 
with given parameters (m, z) (z>1). The equivalent random parame- 
ters x, a are first found, then the above equations are used, even when 
x is not an integer, to obtain the parameters for the interrupted Poisson 
stream. As an example, consider the case used earlier for which m = 
10, z = 3, with unit service rate. Using x = 39.6148, a = 46.7214, one 
finds \ = 25.9889, y = 4.3033, w = 2.6912. 

An interrupted Poisson stream may be used in simulation studies. 
It may also be used to obtain the blocking probability of Fig. 1, instead 
of the equivalent random method or Hayward’s method. The blocking 
probability, P;,, is given by™* 

Pp=1+ ) CW) il oy, Sera 


(100) 


Since this process is constructed to be a good approximation to an 0- 
stream, it is to be expected that P;, in (100) should be in close 
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agreement with (63), as given by the equivalent random method. The 
following numerical examples of Table III show this. 

The interrupted Poisson process is used when analyzing the delay 
queue considered in Section IX. 


Vill. DAY-TO-DAY LOAD VARIATION 

Observation of traffic on a consistent hour basis from day to day 
shows variation. To partially account for this variation,® one often 
assumes that the offered load, a, is random with a gamma density f(x). 
Thus, 





a—1 
x) = el = ; 101 
fis) = ent (101) 
ay =m, ay? = o’, (102) 


in which m, o? are the mean and variance, respectively, of a. Any load- 
dependent statistic such as B(x, a), aB(x, a), z, etc. is replaced by its 
respective means. Let g(a) be such a statistic; then it is necessary to 
evaluate Hg(a). One can construct an approximation by using the 
Gauss-Laguerre quadrature theory. Then the offered load can be 
considered to consist of two Poisson streams whose offered loads are 
a1, a2 and to occur with probabilities P,, P,, respectively. The quan- 
tities P,, a,, P2, a2 in terms of m and o are 


(103) 


m+orta Vm? + @? 
aig rama (104) 


o 
>] 
vm? + o? 
m + «2 — 6 Vm’? + a” 


ips eee (106) 
m 


Py = =+ 5 (105) 


Nie 


Thus, one has 
Eg(a) = Pig(ai) + Peg(az). (107) 


Table 1II—Comparison of interrupted Poisson 
method with equivalent random method 
10 10 15 


8 15 10 
10 3 3 
0.5273 0.4910 0.1492 
0.5278 0.4912 0.1507 


TW S| a 
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For example, to compute day-to-day time congestion, one uses 
EB(x, a). 
Using the symbol B for this quantity, one has, from (107), 


B = P,B(x, a;) + P2B(x, az). (108) 


Of possibly greater importance is the proven: Ps, that a call is 
blocked. To obtain this, let 


O(x, a) = aB(x, a), (109) 
that is, 0(x, a) is the overflow traffic, and let E0(x, a) = 0, then 


P; = (110) 


3 | ol 


Thus, one has approximately 
1 
Pz = s [P,a, B(x, a) + PrarB(x, a2)] : (111) 


Wilkinson found empirically that one may often relate o? to m 
through 


o”? = 0.138m"», 0.138m?*", 0.18m"*4 (112) 


for low, medium, and high variation, respectively.° Programs for eval- 
uating B and Pz are given in the Appendix. A numerical example is 
x = 20, m = 15, and (112) for medium variation. (o? = 12.9807.) One 
finds Pz = 0.0795 and B = 0.0635. 

The behavior of B is somewhat counter-intuitive, since, while B 
may at first increase with increasing o, ultimately it decreases to 
zero.'° The behavior of Ps is more satisfactory since it increases 
monotonically to one for ¢ — ©. One also has Pg = B. An example for 
which B < B(x, m) is given by B(3, 2) = 0.2105, while B = 0.1988 when 


o=1. 


IX. GT/M/S QUEUE 


Until now no delay queues were considered; however, it is becoming 
more and more important to estimate delays occurring in traffic 
systems. The approximations developed in this paper serve this pur- 
pose. In particular, the interrupted Poisson stream can be used as 
input to a GT/M/S queue for which an exact solution can be obtained. 
The peculiarity of the problem considered here is that the input stream 

is defined through specification of (m, z) rather than the usual speci- 
fications used in queueing theory (distribution of time between ar- 
rivals, etc.), hence the designation T for traffic. Considered from this 
point of view, a queue may be a queue in a traffic environment. Figure 
4 shows this. For a discussion of this approach see Heffes.?” 


1298 TECHNICAL JOURNAL, SEPTEMBER 1984 


S=NUMBER OF SERVERS 


(m, 2) 


z>1 
Fig. 4—Traffic-type queueing system. 


The solution of the GT/M/S system with interrupted Poisson 
stream (obtained from three-moment match) as input may be expected 
to provide a close approximation when the input to the GT/M/S is a 
superposition of 0-streams. But even in more general situations the 
approximation is good. 

The servers are assumed to have exponential service-time distribu- 
tion with unit mean service rate, and to be identical and independent. 
The paradigm for the solution is as follows: 


(m, z) > (x, a) > (A, y, w) > w(t). (1138) 


Thus, from the traffic, m, and peakedness, z, one obtains the 
equivalent random parameters x (number of trunks) and a (offered 
load), which give the interrupted Poisson parameters A, y, and w. The 
interrupted Poisson stream is now used as input to a GI/M/S queue 
from which the exact waiting time distribution is computed. To 
accomplish the last step, the following formulae are used.'® The 
function f(s) is given in (98). Define r to be the root of 


r= {1 — 7S), 14) 


where S is the number of servers, satisfying 0<r<1,andé6=1-r. 
Thus, 


V{(X + y + w)S — $7}? + 4S97{(y + w)S — ro} 


— {(A\ + y + w)S — S7}]. . (115) 
Define the renewal density, m(s), by 


1 
= 592 | 


nee ae Poot a 
m(s) 7a) 4) 1 (116) 
Then, the following quantities are calculated: 
C; = I m@, . (117) 
i=l seee a : 
pi -geas “NS ow pS 
ee ree >] Meee 00) Rio 
A 1 +. a Ra ES ey 118 
“3 * 2G FO ° Sb; ee 
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P= 7 (119) 
One now has 
P{w > 0] = P, (120) 
Plw > t] = Pe***, (121) 
P 
EW= SD° (122) 


The above computation of delay is very sensitive to errors in the 
determination of f(s), which, in turn, rests on an accurate evaluation 
of B(x, a). Accordingly, another method of computing B(x, a), which 
is more accurate than (20), will be used. The formula is® 


[x+a] 
B(x, alt= YS xV%a7, (123) 
j=0 
Let 
U;=x%a, (124) 
Then, the relative error, ¢, in calculating B(x, a) is 
e = —B(x, a) U {x44}. (125) 


A computer program employing (123) is given in the Appendix. This 
program combines all computations needed. It accepts m, z, S and 
yields r, P, EW. A test based on Ujr+a; is made: If Ujs+a) < 107’, the 
computations are accepted and the results printed; this also reduces 
the time of computation by reducing the number of times a particular 
loop is used. If, however Ujz+a) > 107’, then the computation is 
considered inaccurate and the message not accurate is printed. 
This situation occurs when the offered load, m, is not large and the 
peakedness is near one. For example, m = 5, z = 1.1, S = 7 prints not 
accurate. Formula (123) is not as robust as (20) and therefore was 
not used in previous computations of this paper. If greater accuracy is 
needed in those computations, then (123) may be used in place of (20) 
with the required test for accuracy. Table IV gives some sample 
calculations. 


X. SUMMARY 


The mathematical methods and algorithms presented here enabled 
the construction of convenient computer programs that rapidly eval- 
uate many traffic-related problems. These represent some of the main 
approaches and approximations used in this area of traffic theory. 

Some of the routines used in the individual programs are common. 
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Table IV—Examples of GT/M/S computations 


m z Ss r P EW 
12 1.1 15 0.8160 0.3444 0.1248 
5 3 8 0.8750 0.5132 0.5132 
2 5. 5 0.8882 0.6091 1.0894 
2 3 4 0.8418 0.5814 0.9186 
1 4 3 0.8468 0.6392 1.3911 
1 3 2 0.8505 0.7446 2.4896 
0.7 4 1 0.9354 0.9354 14.4851 


For this reason, a main index-driven program containing all of the 
programs given would shorten coding and enable the user to select one 
program after another as the need arises. 
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APPENDIX 
Fortran Programs for Main Calculations in Text 
Erlang Loss Function 


IMPLICIT REAL *8 (A-H,O-Z) 
REAL*8 MO, M1 
WRITE (6,10) 
10 FORMAT (/,1X,4X,'ERLANG LOSS FUNCTION', 
2 //,1X,'NO. OF TRUNKS =') 


TRAFFIC 1301 


READ (5,20) X 


20 FORMAT (F7.4) 
WRITE (6,30) 
30 FORMAT (1X, 'OFFERED LOAD =') 


READ (5,20) A 
N = IDINT (X) 
H = X-N 
B= 1. 
DO50J=1,N 
B=J/A*B+1 
50 CONTINUE 
B1=(N+1)/A* B+ 1. 
B2 = (N+2)/A* B1+ 1. 
Y = B¥*(1.-H) *B1**H 
Y=Y * (B1*B1/B/B2) **(H*(1.-H)/2.) 
B=1./Y¥ 
MO = A*B 
ZO= 1. — MO +A/(X+1.+MO-A). 
M1 = IDINT (100000. *M0O+.5)/100000. 
Z1= IDINT (100000.*Z0+.5)/100000. 
C = IDINT (100000.*B + .5)/100000. 
WRITE (6,70) X,A,B, M1,21 
70 FORMAT (/,1X,'B(',F8.4,',' 


’ 


2 F8.4,')=' , F7.5,/, 1x, 
3 ‘OVERFLOW =', F8.4,/,1x, 
4 'PEAKEDNESS =', F8.4,////) 
STOP 
END 


Offered Load From Blocking 


IMPLICIT REAL*8 (A-H, 0-Z) 
WRITE (6, 10) 


10 FORMAT (/,1X,2,'OFFERED LOAD FROM', 
2 "BLOCKING' ,//1X, 'TRUNKS = ') 

READ (5,20) N 

20 FORMAT (13) 
WRITE (6,30) 

30 FORMAT (1X, 'BLOCKING PROB. = ') 
READ (5,40) P 

40 FORMAT (F4.4) 


A=N/(1.-P) -1. 
DO 60 K=1, 5 
B=1. 
po50J=1,N 
B=J/A*B+1. 


50 CONTINUE 
F=1,-P*B 
F1=N/A-1.+ 1./B 
A=A-F/F1 

60 CONTINUE 


C = IDINT (10000.*A+.5)/10000. 
WRITE (6,70) C 
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70 


FORMAT (/, 1X, 'OFFERED LOAD = ' ,F6.4) 
STOP 
END 


Offered Load From Carried Load 


10 


20 


30 


40 


50 


60 


70 


IMPLICIT REAL*8 (A-H,O0-Z) 
REAL*8 L 

WRITE (6,10) 

FORMAT (/,1X,2X,'OFF.LOAD FROM', 
"CAR. LOAD',//,1X,'TRUNKS = '') 
READ(5,20)N 

FORMAT (13) 

WRITE(6,30) 

FORMAT(1X, 'CARLOAD=') 
READ(5,40) L 

FORMAT (F7.4) 

A=L*(1.+L/N/ (N-L) ) 

DO 60 K=1,5 

B=1. 

DO 50 J=1,N 

B= J/A*B+ 1. 

CONTINUE 

B= 1./B 

F=A*(1.-B) -L 

F1=1.-B - (N-A+A*B) *B 
A=A-F/F1 

CONTINUE 

C = IDINT(10000.¥*A+.5)/10000. 
WRITE (6,70) C 

FORMAT (/,1X,'OFFERED LOAD=', 
F8.4,////) 

STOP 

END 


Trunks From Blocking 


10 


20 


30 


40 


IMPLICIT REAL*8 (A-H,0-2Z) 
WRITE (6,10) 

FORMAT (/, 1X,4X,'TRUNKS FROM', 
"BLOCKING' ,//, 1X, 'OFFERED LOAD = ') 
READ (5,20)A 

FORMAT (F7.4) 

WRITE(6,30) 

FORMAT (1X, 'BLOCKING PROB. = ') 
READ (5,40)P 

FORMAT (F4.4) 

X= (1.-P)¥*(1.+A) 

DO 60 K=1, 10 

N = IDINT (X) 

H = X-N 

B= 1. 
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50 


60 


70 


DO 50 J=1,N 

B= J/A*B+1 

CONTINUE 

B1= (N+1)/A*B+1. 

B2 = (N+2)/A*B1+1. 

Y = B¥*(1.-H)*B1**H 

Y = Y*(B1*B1/B/B2) **(H*(1.-H)/2) 
G = (X+1.)/A+1./Y¥ 
X=X+(1.-P*Y)/DLOG(G) 
CONTINUE 

C = IDINT (10000.*X+.5)/10000. 
WRITE (6,70) C 

FORMAT (/, 1X, 'NO. OF TRUNKS = ' 
F8.4) 

STOP 

END 


Equivalent Random Parameters 


10 


20 


30 


40 


50 


70 


80 


IMPLICIT REAL*8 (A-H,O-Z) 
WRITE (6,10) 

FORMAT (/,1X,5X, 'EQUIV. RAND.', 
'PARA.',//,1X,'LOAD= ') 

READ (5,20) M 

FORMAT (F7.4) 

WRITE (6,30) 

FORMAT (1X, 'PEAKEDNESS = ') 
READ (5,40)Z 

FORMAT (F7.4) 

A = M¥Z+3.¥*Z*(Z-1.) 

X= A*(M+Z)/(M+Z-1.) -M. - 1. 
DO 70 K=1, 30 

N=IDINT(X) 

H=X-N 

B=1. 

DO 50 J=1,N 

B=J/A*B+1. 

CONTINUE 

B1=(N+1.)/A*B+1. 
B2=(N+2.)/A*B1+1. 
Y=B**(1.-H)*B1**H 
Y=Y*(B1*B1/B/B2)**(H*(1.-H)/2.) 
B=1./Y¥ 

G=(X+1.)/A+B 

R=A*B-M 

D=B* (X+M+1.-A-(X+M+1.)*DLOG(G) ) 
A=A-R/D 
X=A*(M+Z)/(M+Z-1.)-M-1 
CONTINUE 
X1=IDINT(10000.*X+.5)/10000. 
A1=IDINT(10000.*A+.5)/10000. 
WRITE (6,80)X1,A1 

FORMAT (/, 1X, 'EQUIV.TRUNKS=', 
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2 
Blocking E.R.M. 

10 

2 
20 
30 
40 
50 

2 
80 
100 

2 

3 

4 


F8.4,/,1X,' EQUIV. LOAD=', F8.4) 
STOP 
END 


IMPLICIT REAL*8 (A-H,0O-Z) 

REAL*8 M2, M2, M 

WRITE (6,10) 

FORMAT (/,1X,6X, 'BLOCKING E.R.M.' 
//,1X, 'TRUNKS = ') 

READ (5,20) C 

FORMAT (F7.4) 

WRITE (6,30) 

FORMAT( 1X, 'OFFERED LOAD = ') 
READ (5,20) M 

WRITE (6,40) 

FORMAT (1X, 'PEAKEDNESS = ') 
READ (5,50) Z 

FORMAT (F7.4) 

A= M¥Z+3.¥*Z* (Z-1) 

X= A* (M+Z)/ (M+Z-1.) -M-1. 

DO 80 K=1,25 

CALL ERLNG (A,X,B) 

G= (X+1.)/A+B 

A=A - (A*B-M)/B/(X+M+1.-A- 
(X+M+1.)* DLOG (G) ) 

X=A* (M+Z)/(M+Z-1.) -M-1. 
CONTINUE 

X=X+C 

CALL ERLNG2 (A,X,B) 

P1=B 

MO = A*B 

ZO = 1.-M0+A/(X+1.+MO-A) 

M2 = IDINT (10000.*MO+.5)/10000. 
Z2 = IDINT (10000.*Z0+.5)/10000. 
X=X-C 

CALL ERLNG(A,X,B) 

P=P1/B 

P2 = IDINT (10000.*P+.5)/10000. 
WRITE (6,100) P2, M2, 22 

FORMAT (/, 1X, 'BLOCKINGE.R.M. =' 
F6.4,/,1X, 'OVERFLOW TRAFFIC =', 
F8.4,/,1X, 'OVERFLOW PEAKEDNESS =', 
F8.4) 

STOP 

END 


’ 


e 


Three-Moment Match 


IMPLICIT REAL*8 (A —-H, O-Z) 
REAL*8 L, L1, M, M1 
DIMENSION D( 3) 


TRAFFIC = 1305 


10 


20 


30 


50 


80 


WRITE (6,10) 

FORMAT (/,1x,5x, 'THREE MOMENT', 
'MATCH' ,//, 1X, 'EQUIV. TRUCKS = ',) 
READ (5,20) X 

FORMAT (F7.4) 

WRITE (6,30) 

FORMAT (1X, 'EQUIV. LOAD= ') 
READ (5,20) A 

B=1. 

N=IDINT (X) 

H=X-N 

DO 50 J=1,N 

B=J/A*B+1. 

CONTINUE 

B1= (N+1)/A*B+1. 

B2 = (N+2.)/A*B1+1. 

Y = B¥*(1.-H)*B1**H 

Y = Y*(B1*B1/B/B2)*(H*(1.-H)/2.) 
D(1) =1./Y¥ 

B3=(X+1.)/A*Y+ 1. 
D(2)=Y/A/(B3-Y) 
B4=(X+2.)/A*B3 +1. 
D(3)=2./A/A*Y/D(2)/(B4-2*B3+Y) 
L= D(3)*(D(2)-D(1))-D(1)*(D(3)-D(2) ) 
L= L/(2.*D(2)-D(1)-D(3))*A 

W= D(1)/L*(L-A* D(2))/(D(2)-D(1)) 
G= W/A*(L-A*D(11))/D(1) 

L1= IDINT(10000.*L+.5)/10000. 
W1= IDINT(10000.*W+.5)/10000. 
G1= IDINT(10000.*G+.5)/10000. 
M= L*W/(G+W) 

M1= IDINT(10000.*M+.5)/10000. 
Z= 1.+L4*G/(G+W)/(1.+G+W) 

Z1= IDINT (10000.*Z+.5)/10000. 
WRITE (6,80) L1, W1, G1, M1, 21 
FORMAT (/, 1X, 'L=',F7.4,/, 1X, 


'W=',F7.4,/,1X,'G=',F7.4,/, 1X, 
'M=' F7.4,/,1X,'Z=',F7.4) 
STOP 

END 


Blocking Day-to-Day Time 


10 


20 


30 


IMPLICIT REAL*8 (A-H,0-Z) 

REAL*8 M,MB 

WRITE (6,10) 

FORMAT (/,1X,4X,'BLOCKING-DAYTO', 
"DAY TIME' ,//,1X,'TRUNKS = ') 

READ (5,20) X 

FORMAT (F7.4) 

WRITE (6,30) 

FORMAT (1X, 'MEAN LOAD = ') 

READ (5,40) M 
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40 


50 


60 


90 


FORMAT (F7.4) 
WRITE (6,50) V 
FORMAT (1X, 'VARIANCE= ') 
READ (5,60)V 
FORMAT (F7.4) 
W1 = (1-DSQRT(V/(M*M+V)))/2. 
= (M*M+V+DSORT (V¥*(M*M+V) ))/M 


AIT=A 

CALL ERLNG (A,X,B) 

MB = W1*B 

w2=1.-W!1 

A= (M*¥M+V-DSORT(V*(M*M+V) ))/M 
A2=A 


CALL ERLNG (A,X,B) 

MB = MB + W2*B 

C = IDINT (10000.*MB+.5)/10000. 
WRITE (6,90) C 

FORMAT (/, 1X, 'MEAN BLOCKING = ' 
F6.4, ////) 

STOP 

END 


' 


Blocking Day-to-Day Call 


10 


20 


30 


40 


50 


60 


90 


IMPLICIT REAL*8 (A-H,0-Z) 

REAL*8 M,MB 

WRITE (6,10) 

FORMAT (/,1X,4X, 'BLOCKING-DAYTO', 
"DAY CALL' ,//, 1X, 'TRUNKS = ') 

READ (5,20)X 

FORMAT (F7.4) 

WRITE (6,30) 

FORMAT (1X, 'MEAN LOAD =') 

READ (5,40)M 

FORMAT (F7.4) 

WRITE (6,50) 

FORMAT (1X, 'VARIANCE= ') 

READ (5,60) 

FORMAT (F7.4) 

W1= (1.-DSQRT(V/(M*M+V)))/2. 

A= (M*¥M+V+DSQRT (V*(M*M+V) ))/M 

A1=A 

CALL ERLNG (A,X,B) 

MG = W1*A*B 


W2=1.-W1 
A= (M*¥M+V-DSQRT (V*¥(M*M+V) ))/M 
A2=A 


CALL ERLNG (A,X,B) 
MB = MB+W2*A*B 

C = IDINT (10000.*MB/M+.5)/10000. 
WRITE (6,90)C 

FORMAT (/,1X, 'MEAN BLOCKING = ' 


' 
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td 


30 


F6.4,///) 
STOP 
END 


SUBROUTINE ERLNG (A,X,B) 
IMPLICIT REAL*8 (A-H, O-Z) 
H=X-N 

B= 1. 

DO 30 J=1,N 

B= J/A*B+1. 

CONTINUE 

B1= (N+1.)/A*B+1. 

B2 = (N+2.)/A*B1+1. 

Y = B¥*(1.-H)*B1**H 

Y = Y*(B1*B1/B/B2) **(H*(1.-H)/2.) 
B=1./Y 

RETURN 

END 


Waiting Time in GT/M/S Queue 


10 


20 


30 


40 


50 


60 


80 
90 


IMPLICIT REAL*8 (A-H,0-Z) 
REAL*8 L,M,N1 
INTEGER § 


DIMENSION D(3) 


WRITE (6,10) 

FORMAT (/, 1x, 'WAITING TIME INGT/M/S', 
'QUEUE' ,//, 1x, 'OFFERED LOAD =' ) 

READ(5,20)M 

FORMAT (F7.4) 

WRITE(6,30) 

FORMAT (1x, 'PEAKEDNESS = ') 

READ(5,40)2Z 

FORMAT (F7.4) 

WRITE(6,50) 

FORMAT(1X,'NO. OF SERVERS = ') 

READ(5,60)S 

FORMAT (13) 

A = M¥Z4+3.*Z2*(Z-1.) 

X = A*(M+Z)/(M+Z-1.)-M-1. 

DO 100 K=1, 30 


N= X+A 
U2=1. 
Y=1. 


DO80J=1,N 

U2 = U2*(X-J+1)/A 

Y = Y+U2 

IF (DABS(U2).LE.1E-7) GO TO 90 
CONTINUE 


B=1./Y 
G = (X+1.)/A+B 
N1=A*B-M 


D1 = B¥(X+M+1.-A-(X+M+1.)*DLOG(G). 
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A=A-N1/D1 
X = A*(M+Z)/(M+Z-1.)-M-1. 
100 CONTINUE 
IF (DABS(U2).GT.1E-7) GO TO 200 
D(1) =B 
B3 = (X+1.)/A*Y+1. 
D(2) = Y/A/(B3-Y) 
B4 = (X+2.)/A*B3+1. 
D(3) = 2./A/A*Y¥/D(2)/(B4-2.*B3+Y) 
L=D(3)¥*(D(2)-D(1))-D(1)*(D(3)-D(2)) 
= L/(2.*D(2)-D(1)-D(3))*A 
W=D(11)/L*(L-A*D(2))/(D(2)-D(1)) 
G = W/A* (L-A*D(1))/D(1) 
V = (L+G+W) *S-S¥*S 
V1 = (G+W) *S-L*wW 
D2 = (DSQRT(V*V+4.*V1*S*S)-V) 
2 /2./s/Ss 
CH". 
E=1. 
uU1=1./D2 
DO 150 J=1,S 
E = E*(S-J+1.)/d 
Cc =c*(1./F(J,L,W,G) - 1.) 
U=E/C/F(J,L,W,G)*(S*F(J,L,W,G) 
2 -J)/(S*D2-J) 
U1=U1+U 
150 CONTINUE 
R=1.-D2 
P= 1./U1/D2 
W1=P/D2/s 
WRITE(6,170)R,P,W1 
170 FORMAT(/, 1x, 'ROOT=',F8.4,/, 
2 1X,'P(W>O) =',F6.4,/, 
3 IX, 'MEANW=',F8.4) 
STOP 
200 WRITE (6,210) 
210 FORMAT (/, 1X, 'NOT ACCURATE' ) 
STOP 
END 
REAL FUNCTION F*8(J,L,W,G) 
IMPLICIT REAL*8 (A-H,0-Z) 
REAL*8 L 
F = (J*J+(G+W) *J)/(d*d+ 
2 (L+G+W) *J+L*wW) 
RETURN 
END 
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On the Capacity of Sticky Storage Devices 
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(Manuscript received December 26, 1983) 


When required to make a transition to a new state, a memory cell may, 
with a probability dependent on the state, refuse to do so (ie., “stick”). 
Assuming that error correcting codes can be used at each write-read cycle, one 
seeks the maximum error-free (in the Shannon sense), long-term average 
capacity per cell and cycle. This problem is solved here for binary cells with 
either unilateral or symmetric stickiness. The methods used apply to more 
general cases as well. In the Appendix, some essential inequalities of dynamic 
programming are demonstrated. 


I. INTRODUCTION 


Information theoretic studies of storage devices have been mostly 
concerned with overcoming the existence of subsets of permanently 
defective cells.. We are concerned instead with the case of identical 
cells with the deficiency that use of a cell in one write-read cycle 
affects the cell’s behavior in the next cycle. An extreme example of 
this is “write-once” memory,” such as punched paper tape or optical 
disks, where the long-term average throughput per cell and cycle is, of 
course, zero. A previous paper® considers a deterministic cell model in 
which the aftereffect of usage is of limited duration, permitting positive 
average rates. The present paper similarly considers the perhaps more 
realistic case of stochastic cells. 

In this model, the store has N cells, each of which can be viewed as 
an input-output automaton. The store is used for T' successive cycles. 


* AT&T Bell Laboratories. 





Copyright © 1984 AT&T. Photo reproduction for noncommercial use is permitted with- 
out payment of royalty provided that each reproduction is done without alteration and 
that the Journal reference and copyright notice are included on the first page. The title 
and abstract, but no other portions, of this paper may be copied or distributed royalty 
free by computer-based and other information-service systems without further permis- 
sion. Permission to reproduce or republish any other portion of this paper must be 
obtained from the Editor. 
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At each cycle, fresh data from a source S;, independent of all earlier 
sources, are encoded into an N-vector X, of inputs applied to the cells. 
The resulting N-vector Y, of cell outputs goes to a decoder that must 
reproduce the source output with a probability of error approaching 0 
as N increases. Note that if cell states can only change on inputs, not 
spontaneously, then the reading operation may be repeated any num- 
ber of times in the same cycle with the same results. At each cycle 
only the probability distribution of the state of the store is known. 
This distribution, together with the cell model, defines a channel 
relating Y; to X;. The difficulty of the problem is that operating at or 
near the capacity of that channel at one cycle may leave an unfavorable 
state distribution for the next cycle. Thus, the largest possible number 
of bits per cell and cycle that are accurately recoverable can only be 
determined by dynamic programming. To make such a program man- 
ageable, a theorem is first proven that permits one to obtain the limit 
for large N at each cycle by considerations involving a single cell. This 
theorem applies to any cell for which state and output are one and the 
same. 

Results are obtained for two types of binary cells. For the first type, 
only one of the two states is “sticky”: when the cell is in that state 
and the input requires transition to the other state, there is a proba- 
bility « that the transition does not take place. For the second type, 
both states are assumed sticky in the same sense. The computations 
reveal that for both models the long-term policy is steady rather than 
periodic. For the symmetric case, this implies that the maximum 
throughput per cell and cycle is just the capacity of the binary 
symmetric channel with crossover probability one-half the sticking 
probability. 


Il. INFORMATION-THEORETIC BOUNDS 


Let Z% % and be finite alphabets, and let Xi € @ denote the 
input and Yi € Y the output of cell i at cycle t. Let Wi EW be 
independent random variables representing the internal random ef- 
fects of cell i at cycle t. 

The initial state of cell i is represented by Yj in &% We use the 


notation X; =o (Xi, aa XP), Y; = (Y?, sae | Y?), W, = (Wi, en eg 
W®). The quantities X, --- , Xr, Yi, ---, Yd’, Wh, ---, W# (a total 
of T vectors in ZN, N variables in % and NT variables in Y ) are 
jointly independent. 


The distributions of the Y} and Wi are given, together with the 
equation describing the operation of the cells 


Yi = fi(Xt, Yiu, Wo). (1) 
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The distribution of the X, is dependent upon the choice of encoding, 
and the source statistics. 

The case of interest is the “homogeneous” one where: 

1. fi is the same function f for all i and t. 

2. The Yj have the same distribution as a generic Yo for all i. 

3. The Wi have the same distribution as a generic W for all i and t. 
The more general case is allowed in the model because the main 
theorem below is valid without the homogeneity assumptions. 

At each cycle t, the encoder and decoder are designed with knowledge 
of the distribution of past events but not of their realizations. Thus, 
neither encoder nor decoder knows the exact state of the memory (i.e., 
the vector Y;-;) at the end of the previous cycle. As W, is also known 
only in distribution, the memory appears as a noisy channel with 
input X; and output Y;,. Then the entropy of the message that can be 
reconstructed with negligible error at cycle ¢ is bounded, in view of the 
data-processing theorem,* by 


I(X:; ¥:), 


where J is the mutual information. For the total throughput, one thus 
has the upper bound 


T 
» I(X:; Y;). (2) 
t= 

Theorem 1: 

T N T oe 
Y (Xs ¥,) s YY UX YD. (3) 
t=1 i=1 t=1 


Proof: By the independence of the initial states, the internal disturb- 
ances, and the sources, one has (H denoting entropy) 


N 
H(Yo) = ¥) H(Y)), (4) 
N . 
H(W,) = ) A(W:) =¢=1,---,T, (5) 
and also 
A(XtY:WiYi-) = H(Xi) + H(Yi.n) + H(W)), (6) 


since Y} is determined by (1), and Y}_; depends only on the initial 
state, the earlier source encodings, and the earlier disturbances, all 
independent of X;, hence of X;. And for the same reason 


*If X and Z are conditionally independent given Y, then [(X; Z) < I(X; Y). 
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A(XLY:W,Yi-1) = H(X:) + A(Y:-1) + H(W,). (7) 

The following four relations are standard properties of entropy: 
A(XLY:W,Y:-1) — H(XY,) = H(Y.-1W.| X:Y:) (8) 
H(XtY:WiYi-1) — H(X:Yi) = H(YiWi| XtYi) (9) 


N 
H(¥..W.1X:Y,) <  H(YiaWi|XiY) (10) 
i=l 
| : 
Hr) <D HY). (11) 
i=1 


Using these facts one has 


T 
» T(X:, Y;) 
t=1 


pit (X,) + H(Y.) — H(X,Y,) (by definition) 


lI 
IMs 


H(Y;) + A(XLY:W-Yi-1 — A(X,Y,;) = HA(Y;-1) 


oe 


— H(W;) [by (7)] 


> 


T 
= H(Y7) — H(Yo) + 2 [H(Y.-1W,.| X.Y.) — H(W;)] 
[by summation and (8)] 
N T 
< X —[H(Yr) — H(Y6) + dX A(Yi-1Wi| Xt *) — H(Wi)] 


2 (11), (4), (10), and (5)] 


ina 


yy (H( Yi) — H( Yi) —A(Wi) + H(YiiWixtYi) — A(XtY?) 


[by summation and (9)] 


N T 
= » H(Y:) + H(Xi) — H(XtY}) [by (6)] 
N T ; 

a py 2 I(Xi; Y#) (by definition), 


ll 
er 


which was to be proved. 
In the homogeneous case, the problem of finding the maximum M 
of YZ, I(Xt; Y') over all distributions of the independent -variables 
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i, ++», Xp subject to (1) is the same for all i. Having found M by 
solving this generic single-cell problem, a bound of NM is established 
for the total throughput. On the other hand, a throughput per cell, 
arbitrarily close to M, can be achieved for large enough N by choosing 
encoders with code words picked at random with each X} independ- 
ently having the maximizing distribution of the generic problem at 
stage t. Thus, the bound is sharp, asymptotically in N. 

The bound is obtained, for any given cell model, by solving a T- 
stage, finite horizon, dynamic program. For large T it is the maximum 
long-term average per cycle (and cell) that is of prime interest. Some 
of the dynamic programming issues are discussed in the Appendix. 


Ill. UNILATERAL STICKINESS 


It is assumed that a binary cell acquires its input as new state when 
its previous state was 0, but when the previous state was 1 and the 
input is 0 the cell remains stuck at 1 with probability e. 

This is modeled, with X,, Y;, W: € {0, 1} by letting W, = 1 with 
probability « and 0 with 1 — e, and in ordinary arithmetic, 


Y-=X,+(1-X)YuW, %t=1,---, T. (12) 

Let 
Di = Pr{X, = 0}, (12a) 
s, = Pr{Y, = 0}, (12b) 

and 
e = Pr{W, = 1}. (12c) 


Then (12) implies 
S: = pr(1 — (1 — S,-1)), (13) 
and one obtains, with h the binary entropy function, 
W(X Y:) = h(p(1 — e(1 — s-1)) — peh(e(1 — 5-1) 


= h(s,) — poh (2 (14) 


This leads to the dynamic program (where 7 denotes the number of 
cycles to go) 


V.(s) = max h(p(1 — « + ¢s)) 


— ph(e(1 — s)) + V,-1(p(1 — e+ s)) (15) 
with Vo(s) = 0. 
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This is an easy task for a computer, and the results show that p and 
s soon stabilize around steady-state values. Furthermore, as the Ap- 
pendix shows, using the value functions found in the finite horizon 
solution, one can derive both upper and lower bounds on the optimal 
long-term average per cycle. In this problem these bounds soon agree 
to many decimals. The computer results are used only to conclude 
that the long-term optimum is steady. (This cannot be taken for 
granted, as often such problems have periodic solutions, crop rotation 
being the most ancient example of this.) 

In a steady state, the constant values of p and s must satisfy, by 
(13), the relation 


s = p(1 — e(1 — s)) (16) 


and by (14) the optimal throughput per cell and cycle is the maximum 
of 


h(s) — ph (:) (17) 


subject to (16). 

This amounts to maximizing a transcendental function on the unit 
interval. While no closed-form solution is known, the maximum, and 
the corresponding p and s are easily computed, and they are given as 
a function of ¢ in Table I. (The maximum is given in base 2, i.e., bits 
per cell and cycle.) It is important to note that for fixed T, the dynamic 
program (15) defines a total throughput which can only be approached 
from below as N increases. On the other hand, the long-range average 
is approached from above, because for finite T the transient effect of 
an initially clear memory will permit a slightly higher total. 

As the case of small ¢ is of greatest interest we note the expansions 
[following from (16) and (17)]: at the maximum 


5 = 57 ra o(e), (18) 
and 
p= st gt 00, (19) 
and the value of the maximum is 
iene (:} + o(e). (20) 
2° \2 
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Table |—Results for unilateral sticking 


Epsilon 


0.00 
0.01 
0.02 
0.03 
0.04 
0.05 


0.06 
0.07 
0.08 
0.09 
0.10 


0.15 
0.20 
0.25 
0.30 
0.35 


0.40 


Maximum 


1.00000000 
0.97718297 
0.95921284 
0.94300294 
0.92790483 
0.91361310 


0.89994925 
0.88679616 
0.87407111 
0.86171265 
0.84967335 


0.79315764 
0.74099706 
0.69176025 
0.64460887 
0.59898428 


0.55447912 
0.51077418 
0.46760281 
0.42472834 
0.38192756 


0.33897650 
0.29563534 
0.25162917 
0.20661826 
0.16014419 


0.11151237 
0.05945393 
0.04841568 
0.03707783 
0.02536387 
0.01313736 


IV. SYMMETRIC STICKINESS 
Suppose that, when its input would require a change of state, a cell 


remains, with probability «, in its former state. 


Input p 


0.50000000 
0.50125629 
0.50252532 
0.50380733 


0.50510257 


0.50641131 


0.50773381 
0.50907034 
0.51042119 
0.51178665 
0.51316702 


0.52030370 
0.52786405 
0.53589839 
0.54446658 
0.55364065 


0.56350833 
0.57417812 
0.58578644 
0.59850838 
0.61257412 


0.62829543 
0.64611064 
0.66666668 
0.69098302 
0.72082550 


0.75974695 
0.81725604 
0.83333337 
0.85236594 
0.87610072 
0.90909100 


State s 


0.50000000 
0.49874371 
0.49747469 
0.49619267 
0.49489743 
0.49358869 


0.49226619 
0.49092966 
0.48957881 
0.48821335 
0.48683298 


0.47969631 
0.47213596 
0.46410162 
0.45553343 
0.44635936 


0.43649168 
0.42582189 
0.41421357 
0.40149163 
0.38742589 


0.37170459 
0.35388938 
0.33333334 
0.30901701 
0.27917453 


0.24025309 
0.18274403 
0.16666671 
0.14763415 
0.12389940 
0.09090918 


This is modeled, with X,, Y:, W; « {0, 1} by letting W, = 1 with 


probability « and 


Let 


Then (21) implies 


Y, = (1 - W.)X:z + WLY;-1.- 


Pt = Pr{X, = O}, 
s: = Pr{Y,; = 0}, 
e= Pr{ W, = 1}. 
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(21) 


(22a) 
(22b) 
(22c) 
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Se = (1 — €) pe + eS:-1 (23) 
and 
T(Xz; Y:) = h((1 — €) pe + €8;-1) 
— peh(e(1 — s-1)) + (1 — pr)h(es:-1). (24) 


Note that, if both states are equally likely, then at the next cycle 
one faces a binary symmetric channel with crossover probability «/2. 
The capacity of this channel 


log 2 — h(¢/2) (25) 


is attained by choosing symmetrically distributed input and this will 
lead to a symmetric distribution of the next state, so that this situation 
perpetuates itself. So (25) is an achievable long-term average, and it 
will turn out to be the best possible. 

Assuming only that the optimum is time-invariant, one has, by (23), 
the condition 


S=p, (26) 


and this reduces the problem to showing that the maximum, over 
0<s<l,of 


h(s) — sh(e(1 — s)) — (1 — s)h(es) (27) 


is at s = 1/2, where its value is log 2 — h(e/2). When s = 0 or 1, (27) 
vanishes. For fixed s in (0, 1) let 


F(e) = log 2 — h(e/2) — h(s) + (1 — s)h(es) + sh(e(1 — s)). (28) 
It suffices to show that F(e) = 0 for 0 < «<1. One has F(1) = F’(1) 
= 0 and 


(1 — e)(1 — 4s(1 — s)) 
F'(¢:) = ——— or 2. 29 
O70 = 00 = =e <2) oy 
So F is convex and tangent to the « axis at « = 1, hence nonnegative, 
as required. 
To confirm the steady-state solution, one runs the finite horizon 
dynamic program, with 7 stages to go: 


V.(s) = max h((1 — e)p + es) — ph(e(1 — s)) 
Pp 
— (1 — p)h(es) + V,-1((1 — —)p +s) (80) 


with Vo(s) = 0. 

Using a grid of probabilities including p = s = 1/2, the upper and 
lower bounds on the long-term average (see Appendix), derived from 
the value functions, converge towards each other. In Table II the 
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Table II—Upper and lower bounds on optimal long- 
term average per cycle 


Epsilon Bounds Formula 
0.05 [0.83133907, 0.83133907] 0.8313390685 
0.10 [0.71360304, 0.71360304] 0.7136030429 
0.15 [0.61568846, 0.61568846] 0.6156884559 
0.20 [0.53100441, 0.53100441] 0.5310044064 
0.25 [0.45643556, 0.45643556] 0.4564355568 
0.30 [0.39015970, 0.39015970} 0.3901596953 
0.35 oaranTi9y 0.33098416 0.3309841649 
0.40 0.27807191, 0.27807191 0.2780719051 
0.45 [0.23080717, 0.23080717] 0.2308071710 
0.50 [0.18872188, 0.18872188] 0.1887218755 
0.55 [0.15145182, 0.15145812] 0.1514518217 
0.60 [0.11870910, 0.11870910} 0.1187091008 
0.65 [0.09026388, 0.09026388] 0.0902638775 
0.70 [0.06593194, 0.06593194] 0.0659319446 
0.75 [0.04556600, :0.04556600] 0.0455659971 
0.80 [0.02904941, 0.02904941] 0.0290494055 
0.85 [0.01629174, 0.01629174 0.0162917374 
0.90 [0.00722555, 0.00722555 0.0072255460 
0.95 [0.00180412, 0.00180412} 0.0018041210 


bounds shown have been obtained by iterating until the difference is 
below 5-107? and they are compared with formula (25), all logarithms 
are in base 2. 


V. CONCLUSIONS 


While complex coding may be impractical, the value of the above 
results is that they provide precise bounds on what is achievable. One 
notes, comparing (20) with (25), that two-sided stickiness is, for the 
same small sticking probability «, twice as damaging as one-sided 
stickiness. The case of two different positive sticking probabilities can 
be handled by the same techniques. 

Note that the problem can be considered in another light by inter- 
changing time and space. It then becomes a special case of the general 
interference channel, as shown in an earlier paper.® 
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APPENDIX 
Remarks on Dynamic Programming 
A.1 Introduction 


The literature on dynamic programming (e.g., Refs. 4, 6, 7) is 
primarily devoted to the important case of stochastic systems (con- 
trolled Markov chains). Deterministic dynamic programs are covered 
as a special case of degenerate probabilities. As a result, the theorems 
proven in most texts and papers make assumptions that are not 
necessary for the most important deterministic results we need. Among 
such assumptions the following two are particularly bothersome: 

1. The state space must be finite. 

2. All policies must be ergodic. 

The first assumption rules out even the case of a real interval as state 
space (which is our case). The second is essentially never true in a 
deterministic setup. This is our justification for giving the very simple 
proof of the results we need in this paper. 

We are concerned with the optimal long-term average return per 
stage. It has been claimed that this case is only academic because 
either the number of stages is small, and then a finite horizon treat- 
ment is appropriate, or it is large, and then the time span involved is 
such that the time value of utility must be included by using the 
discounted model instead. In this paper, as millions of memory cycles 
can take place per second, we have a strong counterexample to this 
viewpoint: the undiscounted, long-term average is the appropriate 
quantity to study. 


A.2 The deterministic case 


The deterministic programs are defined by two sets, S, U, and two 
functions: r: S xX U > R, f: S X US. Here S is the state space, 
U the control variable set, r(s, u) the return using s € S, u € U and 
f(s, u) is the next state. The finite horizon program is thus written 


Vi(s) = sup r(s, u) + V,-i(f(s, u)), (31) 


where 7 is the number of stages to go and Vo(s) = 0 
It has long been known that one can eliminate U, r, and f in favor 
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of a relation on states and a function defined on this relation. Specif- 
ically, let p(s) C S be defined by: 


p(s) = f(s, U), (32) 


then the pairs {(s, s’)|s’ € p(s)} form a subset p of S x S. Also, for 
s’ E p(s), let 


R(s, s’) = sup{r(s, u) |f(s, u) = s’}. (33) 


In words, one need only know (i) which states one can go to next, and 
(it) what is the optimal value of going there. The finite horizon problem 
then becomes 
Vs) = sup R(s, s’) + V,-1(s’). (34) 
s’Ep(s) 
This reformulation (which is not possible for stochastic problems) has 
two advantages: 

(i) Theoretical analysis is simplified. 

(it) In computation, when an infinite S is approximated by a finite 
subset S,, no requantization is needed, avoiding this cause of error 
accumulation. One solves the problem for the subset, with p and k 
restricted to S; X Sj. 


A.3 Bellman’s equation for the long-term average 


If \ denotes the optimal long-term average return per stage, then it 
should satisfy the equation, already given by Bellman 
A+ W(s) = sup: k(s, s’) + W(s’). (35) 
s’Ep(s 
for alls ES. 

The unknown constant \ and unknown function W (which matters 
only modulo the addition of an arbitrary constant) are reminiscent of 
an eigenvalue equation. However, the maximum operator, while non- 
linear, has a more favorable numerical behavior. Of course (35) is 
stated on the assumption that the required limit exists: it assumes 
that a policy for which the lim sup is as large as possible actually gives 
a limit. This depends on the structure of relation p, but we will not 
pursue this question, as our need is rather for inequalities. 


A.4 Performance inequalities 


Let V be any bounded real function on the state space S. For 
instance, V could be a value function obtained from a finite horizon 
program, an average of several such functions (in periodic cases), or 
just a plain wild guess. Then define V by 
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V(s) = sup R(s, s’) + V(s’). (36) 


s’Ep(s) 
Let 
M = sup V(s) — V(s), (37) 
sES 
m = inf V(s) — V(s). (38) 
sES 


Theorem 2: For any bounded V, with M and m defined by (36, 37, 38): 
(t) The supremum, over all admissible policies, of the limit inferior of 
the average return is at least m. (ii) The limit superior of the average 
return, under any admissible policy, cannot exceed M. 

Proof: (i) The supremum on the right of (36) is attained, exactly or 
within ¢, by choice of s’ € p(s) for each s. Thus there exists, for any 
«> 0, a function ¢.: S — S such that, for all s in S, 


Vis) = R(s, o.(s)) + V(o.(s)) = V(s) — «, (39) 
and ¢,(s) € p(s). As V is bounded, let 
c= sup Vis) -— inf V(s). (40) 
From (38) one has V(s) — V(s) = m, hence by (39), 
k(s, o(s)) + V(oAs)) — V(s) =m—e. (41) 
Generate a feasible sequence of states from some initial so by 
St = bASt-1), t=1,2,.--. (42) 


For this sequence 
R(So, $1) + V(s1) — V(so) =m —e 
R(s1, 82) + V(s2) — V(si) = m — € (43) 
R(s:-1, &) + V(s:) — V(s:-1) = m — & 
Adding these relations gives 
y R(s;-1, si) + V(se) — V(so) 2 t(m — €) (44) 


i=1 


and by (40), dividing by t 


die c 
“. > R(si-1, si) zm—-e~-, (45) 
U j= t 
so that 
1 t 
lim inf > Y R(s;-1, s:) =m — « (46) 
fe i=1 


and as ¢ is arbitrarily small, claim (i) is proved. 
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(ut) By (86), for any s, 


V(s) — Vis) = M (47) 
which, by (36), gives, for all s and s’ € p(s), 
k(s, s’) + V(s’) — V(s) s M. (48) 
Thus if so, $1, So, --- is any admissible sequences [i.e., 5:41 © p(s,) for 
all t], then 
R(So, $1) + V(s1) — V(so) = M, (49) 


R(si-1, St) + V(se) — V(se-nr) = M. 
By addition 


y R(s;-1, si) + V(s:) — V(so) = Me. (50) 
i=l 


Thus by (40), dividing by t 


‘ee c 
— ¥ R(si-n, si) S M + - (51) 
t i= t 
so that 
1 t 
lim sup t > R(si-a, Si) = M, (52) 
t—00 i=1 


which was to be proved. 

These inequalities have been known at least since the publication 
of Refs. 3 and 10, although their proof is submerged under unnecessary 
assumptions. The inequalities are crucial to any computer treatment 
of the long-run average. In addition, the above theorem proves the 
sufficiency of Bellman’s equation (35), for if this equation holds with 
a bounded V and some 4, then for this V one obtains m = M = 4, and 
thus \ is optimal and achievable. 


AUTHOR 


Hans S. Witsenhausen, Ph.D. (Electrical Engineering), 1966, The Massa- 
chusetts Institute of Technology; AT&T Bell Laboratories, 1966—. Mr. Wit- 
senhausen has been associated with the Universite Libre de Bruxelles, with 
the European Computation Center, Brussels, and with the Princeton Research 
Division of Electronic Associates, Inc. At MIT he was with the Electronic 
Systems Laboratory as a Lincoln Laboratory Associate and a Hertz Fellow. 
At AT&T Bell Laboratories, he is associated with the Mathematics and 
Statistics Research Center. He was a Senior Fellow at the Imperial College of 
Science and Technology, London, in 1972, a Visiting Professor at MIT in 
1973, and Vinton Hayes Senior Fellow at Harvard University in 1975-76. He 
has worked on problems of hybrid computation, control theory, optimization, 
geometric inequalities, and other applied mathematical fields. 


STICKY STORAGE DEVICES 1323 


AT&T Bell Laboratories Technical Journal 
Vol. 63, No. 7, September 1984 
Printed in U.S.A. 


Velocity-Saturated Characteristics of Short- 
Channel MOSFETs 


By G. W. TAYLOR* 
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A theory is developed for the I-V characteristics of metal oxide semicon- 
ductor field-effect transistors (MOSFETs) when the channel fields are suffi- 
ciently high to cause appreciable saturation of the carrier drift velocity. The 
full velocity-field curve for bulk silicon is used with the base value adjusted to 
account for surface scattering effects. Use of this form gave the best fit to 
experimental data. Using some simple expansions to reduce the rather complex 
integral produces a useful analytic result, which gives a continuous description 
from the square law results for long-channel devices throughout the whole 
range of velocity-saturated operation in short-channel devices. For the first 
time the electron temperature has been introduced as the parameter, which 
increases the channel charge at pinch-off, decreases the saturation voltage, 
and increases the channel field at the pinch-off point as the current (and 
hence bias voltages) is increased. The effects of series resistance and surface 
roughness scattering are incorporated into the analytic formulation. We com- 
pare the results with experimental submicron devices and find excellent 
agreement. 


I. INTRODUCTION 


As the metal oxide semiconductor field-effect transistor (MOSFET) 
evolves towards submicron channel dimensions it is found that, below 
gate lengths of 5 um, its performance is modified by the effects of hot- 
electron scattering on carrier transport. When a device-reaches a gate 
dimension of 0.5 um, its complete “on” region is dominated by velocity 
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saturation. Almost all descriptions of these phenomena in the litera- 
ture are based on two-dimensional numerical solutions of total device 
operation.’ * Such an approach provides insight into device operation 
and the nature of its physics, but it is of little practical use from a 
designer’s point of view, since it can realistically consider only a single 
device at a time. Generally, circuit simulation programs must use 
empirical fitting routines to generate device characteristics in the 
velocity-saturated mode because there is no analytical solution of this 
behavior. Some estimates of velocity-saturated conduction have been 
based on the simplified model of a constant mobility up until the 
saturation velocity is achieved;>* although some of these results have 
been useful over certain particular voltage ranges with parameter 
fitting, the interpretation of the physics has been lacking. In fact, 
academically, a more rigorous treatment of the problem greatly helps 
us visualize the ultimate limits of device miniaturization. A proper 
description of device performance in velocity saturation must be 
established as a natural extension of the device characteristics before 
velocity saturation becomes significant. However, since an analytical 
solution can be found only for certain special cases, one is usually 
forced to assume a two-section model as just described. Such an 
approach unavoidably leads to discontinuities in the incremental 
parameters and, particularly, in regions of crossover. In addition, it 
precludes prediction of the pinch-off field, which is fixed at ¥& = & 
in this model. The other missing feature in previous work has been 
the electron temperature. Since the carrier transport is due to electrons 
moving with their saturated drift velocity and hence at elevated 
temperatures, the electron temperature must be an integral part of the 
representation. 

In this paper we use the velocity-field relationship that most nearly 
fits the experimental data as a starting point to develop a model. The 
charge transport description is derived from basic principles and 
incorporates the electron temperature as a natural part of the solution. 
It is shown in detail how the velocity-saturated characteristics are a 
natural extension of the constant mobility case. We will consider the 
case of both the triode and the saturation regions; for the saturation 
case the details of channel length modulation under velocity-saturated 
conditions are derived in Appendix A. The effects of series contact 
resistance are incorporated into the theory, and the results are com- 
pared with data over a wide range of parameters. 


ll. THEORY 
2.1 The velocity-field relationship 


When the drift velocity can no longer be considered a linear function 
of the electric field, modifications to the metal oxide semiconductor 
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(MOS) theory must be introduced. Many functional forms have been 
used to characterize the decrease in mobility at higher electric-field 
strengths.”* For the case of bulk Si, Conwell has shown that the 
assumption of a Maxwellian distribution with electron temperature, 
T., for the symmetrical part of the electron distribution function yields 
predictions that agree with the experiment.* The result for the energy- 
loss rate, B(T.), for phonon scattering is 


1/2 2 a 
Br) = auo(B2) (2) Fee Pe () 


where io is the zero field mobility, T> is the lattice temperature, v, is 
the saturation velocity of electrons, and gq is the electronic charge. 
Now it has been shown that the general momentum and energy 
relations are 


j= anu(T.)8 +.  (nD(To) (2a) 

je = nBT,) +2 (Tk EE, (2b) 
q dx 

respectively, where % is the electric-field strength; u(T.) is the tem- 
perature, or field-dependent mobility; j is the electronic current den- 
sity; k is Boltzmann’s constant; n is the electron density; and 6(T.)kT. 
is the average kinetic energy transported per electron.’ Since we are 
dealing with the MOS surface along which the scattering mechanisms 
are not well known, (1) may not be appropriate; we will continue to 
use it, however, to expedite the analysis. The second term on the right- 
hand side of (2b) is equivalent to dS(T.)/dx. Here S(T.) is the flux of 
energy in the positive x direction; we have taken S = —(j/q)6(T.)kT« 
only and ignored the small contribution —K(T.)[(dT.)/(dx)] because 
of the thermal conductivity of the electrons. Note that 6 typically is 
approximately a constant for a particular relaxation time relation and 
for acoustic phonon scattering 5 ~ 2.!° Equations (2a) and (2b) have 
three unknowns— & n, and T.—so that the third relation of Poisson’s 
equation for the MOS channel region is required to obtain a complete 
solution. The most familiar form of this equation is the one-dimen- 
sional charge equation (the gradual channel approximation) for the 
MOS channel, which is 


q i) n dy = Co(Vas — Vr — V), (2c) 


where V is related to & by 


if Edx' =V 
0 
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and C, is the oxide capacitance. These equations are solved by simpli- 
fication of (2a). The right-hand side of (2a) is the sum of the drift and 
diffusion components of the device current, the drift component being 
characterized by a drift velocity, 


= w(Te) & (3) 


It is well known for conduction above the threshold voltage that over 
most of the MOS channel, drift is the dominant component, and 
hence, only the first term on the right-hand side of (2a) needs to be 
retained. We can use this result in (2b); then by neglecting the second 
term on the right-hand side of (2b) and substituting from (1) for B(T.), 
we obtain the result 


T, 1/2 T, os: 2 - ; 
qo (7) T, 1 a = (Bp. 


Solving for T. we find 
2 4 2 1/2) 2 
7, = 7, ELA oY fe) (4) 
2 Ho 4 Ho 
where & is the critical field parameter, 
Ee = Us/[Ho- (5) 


We can then use this result to evaluate the relative importance of the 
neglected term in (2b). Using (2c) in (2a) we can write the current- 
field relation for the MOS channel, which is 


T= w(¥)C.(Ves — Vr -— V) & (6a) 
If we then use a typical mobility 
we = pwo(To/Te)*” (6b) 


such as one might find in a bulk crystal, we can use (4) and (6a) to 
evaluate the relative importance of the terms in (2b), and we find 


kT>| eV’ 

J scr.yp GE — + (=) | ; 

q dx _ 28(T.) 2 ee (=) (6c) 
IES = *" (Ves — Vr — V) \&/ 


This ratio is small until we are close to pinch-off in the channel or 
unless & > &. The restriction on this term comes from neglecting 
the diffusion term in (2a). We will assume that (4) is valid throughout 
the MOS channel. 

Equation (4) has been derived without reference to a specific func- 
tion, u(&). We therefore can use (4) to determine 7., and we must 
still determine p(&). To determine p(&), we could just assume the 


1328 TECHNICAL JOURNAL, SEPTEMBER 1984 


bulk relation (6b), which would then yield the results 


L= ae (7a) 
14 (2) 
and 
Te = Toll + (8/&)’). (7b) 


However, there has been much speculation that this form is inadequate 
for the Si surface mobility,’”* and a variation of this form that has 
been used fairly widely is 


u( &) = Ae (7c) 
? (z) | 


where B was determined by Caughey and Thomas’ to be B = 1.1. To 
illustrate the variation of »(&), plots were generated and are shown 
in Fig. 1 for values of B ranging from 1 to ©, The important parame- 
ters—po, &, and v,—are identified in the figure. 

Because the determination of B at this stage in our understanding 
of MOS surface physics is unavoidably experimental, we must, at the 
outset, choose a value for B that will allow a physical solution to be 
found. This is necessary because the use of an undetermined B leads 
to a mathematically hopeless situation. The vindication of this ap- 
proach ultimately will come from an unambiguous determination of 


fee] 


V( &) (cm/s) x 108 
Oo 


> 


é. = 1.84 x 104 V/cm 





0.5 x 104 2.0 x 104 3.5 5 
&x 104(V/cm) 
Fig. 1—Variation of v( &) for the formula (7c) with values of B ranging from 1 to ©. 


A value of uo = 650 cm?/V-s was used here simply to clarify the variation of v with 
electric field. 
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the parameters po, &%, vs, and B from the experimental data. Since it 
is suspected that B = 2 for N channel devices (which is of main interest 
here), we will give the greatest attention to the treatment of this form. 
In our final comparison with experiment we will demonstrate that the 
value of B = 2 is a good one. 

In the remainder of the paper we will discuss the effects of a field- 
dependent mobility, u(&), on the MOS I-V characteristic and the 
resultant modifications to the conventional device laws (B = ©) for 
B = 2 generally and with some mention of the case B = 1. The theory 
will show, as is well known in short-channel devices, that when the 
drift velocity of carriers approaches the saturated value, the current 
from the conventional case, is reduced substantially. 

The theory will also show the well-known experimental result that 
the drain saturation voltage for the velocity-saturated case is substan- 
tially less than the conventional value. It is shown that the decrease 
in voltage is caused by the increase in the charge in the channel at 
pinch-off under hot-electron conditions resulting from the elevated 
electron temperature. 


2.2 Device characteristics 
2.2.1 Triode region 


Figure 2 gives a device cross section, which shows terminal voltages 
and possible series resistances in the source and drain leads. When 
the general mobility relationship (7) is used, only for the special case 
of B = 1 can a closed form solution for the current-voltage character- 
istic be obtained. If we use (5) and (7b) for B = 1, the result shown by 
Hoeneisen? is 


Ves 





Ves 


Fig. 2—Device cross section showing terminal voltages and possible parasitic series 
resistors. 
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7 = HoCoW (Es — Vr)Vps — vis] 


L 1+ Vps/ AL (8) 


For the case of B = 2, the MOS device current, when approximated 
by the drift mechanism only, yields 
W . 
7 ane Col Vas — Vr - Vd (9a) 
js (Z) | 


By rearranging terms and changing variables, we obtain 


[= 





IL 2 i 2 yr2y1/2 
WC, — as (2° — I’*)"*dz, (9b) 
where 
l=, z= Ves — Vr—- V. 


Performing the integration, we find 














WC, I \ 
I= rae (Vas — Vr) (Ves — Vr)? — (aaa) 
2 
= - _ es 2 
(Ves — Vr — Vps) (Vas — Vr — Vps) (;A x) 
7 
7 ce 
I 2 
— — 2 —_— 
Vas — Vr + (Ves — Vr) ra =) | 
ha) ——————_ |}. (10a) 
I 2 
Vas — Vr — Vos + (Ves — Vr — Vps)? - (ca 


Previously, it has been shown that (for the constant mobility case) it 
is possible to incorporate the effects of the bulk charge variation into 
Vr.'® As the gate voltage increases into the range where velocity 
saturation becomes significant, the threshold voltage modulation by 
the drain voltage becomes relatively unimportant. We will therefore 
assume that we can continue to use the drain-voltage dependence of 
Vr that was developed for the constant mobility case. Equation (10a) 
describes the device characteristic exactly up until pinch-off has been 
achieved but, because of its implicit nature, it is of relatively little use. 
However, we produce useful results with some simple approximations. 
Consider the terms under the second square root sign. This quantity 


MOSFET 1331 


must be positive to have a physically meaningful result, so that 





I 2 
< (Ves — Vr — Vps)?. 
(4 =) (Ves T Ds) 
Now the current on the left-hand side is a maximum and the right- 
hand side is a minimum at the pinch-off condition, and if the equality 
sign were obeyed at this point we would have 


T= C, WusQ. (10b) 


In writing (10b) we have used the definition of @ as the channel charge 
at the pinch-off point, as we develop in Appendix A [see (83)]. From 
this result we see that the inequality rather than the equality is always 
obeyed, since (10b) requires that all of the charge be moving with the 
saturated velocity at the pinch-off point. This condition can never be 
reached because of the contribution of diffusion to the current flow. 
We therefore note that under all square root signs in (10) we will have 
positive quantities and also that the voltage term is generally much 
larger than the term J’. We may therefore make expansions of all 
square root quantities, and by retaining terms to the first order, we 
obtain 


1 WwW 
I= 9 HOT Cs (Ves — Vr)? — (Ves — Vr — Vos)? 
i be 
2(Ves — Vr) — F-— 
—1”In & 7a (10c) 
2(Ves — Vr — Vos) - => 
( GS T ps) Vas = Vr ae Vis 


For the moment we will neglect terms in J’ in the argument of the 
logarithm without incurring significant error, and we may then solve 
the resulting quadratic equation for J to obtain 


I= Wu,C, 
2 gl 2 ca _ _ 2 1/2 
(4) ,, Ves — Vr)? - (Ves - Ve vos _ = ais 
a a a 
where 
Ves — Vr 

a = ln | —_———_"—_ }.. 1lb 
(= Vn Ves nh?) 


This expression for the current is very useful because it allows us 
to predict the two limiting forms of conduction that are always 
observed in a short-channel device. For small gate (drain voltages) the 
term in square brackets is small, and we may expand under the square 
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root sign to obtain 


C,W Vi 
> re Ves = Vr) Vps — bs) ’ rid, 


: 2 


which is precisely the form of the constant mobility case tr we 
expect to obtain when the effects of velocity saturation can be ignored. 
On the other hand, when this term becomes large, only this term needs 
to be retained, and we find 


v;C, W Vi 
=e \/ (os — Vr)Vos — (13) 
This equation applies to only a limited range, since typically there is 
only a small range of drain voltage in which velocity saturation is 
dominant before pinch-off occurs. This will become evident in our 
comparison of the exact and approximate solutions. 


I 


2.2.2 Saturation region 


Traditionally, in MOSFET physics the terms operation in pinch- 
off and operation in saturation were interchangeable because it was 
generally thought that pinch-off was necessary for saturation to occur. 
For some time now it has been recognized that diffusion becomes 
important near the pinch-off point.!” It is shown in Appendix A [see 
(79) through (83)] that the saturation voltage is reached when a 
particular ratio, R, of drift to diffusion conduction is reached in the 
channel. R is derived in Appendix B [see (200) and (201)] in terms of 
the basic device parameters. The transfer of the conduction mecha- 
nism from drift to diffusion is the extent of any pinching effect, since 
beyond this point in the channel the drift component increases once 
again, the oxide field reverses in direction, and the current moves 
away from the surface into the bulk. Saturation of current occurs 
basically because of the occurrence of field reversal at some drain 
voltage. The saturation point and the field-reversal point are separated 
in potential in the channel by a potential, Q/C,, where Q is the channel 
charge (mobile charge) at the saturation point. For the constant 
mobility case the charge, Q, is small, so that the potential difference 
between the two points is small in that case. Hence, there really was 
no need to discriminate between them. As velocity saturation becomes 
pronounced, the charge, Q (and thus voltage, Q/C,), grows consider- 
ably, leading to a substantial change in Vsar (the drain saturation 
voltage) from its conventional value. Thus, the potential difference 
between the saturation point and the field-reversal point continues to 
grow, as does the physical separation between the two points, although 
the potential will increase more since the field is also increasing. From 
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this point on we will refer to the point in the channel where the 
diffusion current is RJ as the saturation point or the pinch-off point, 
interchangeably. We will distinguish between it and the field-reversal 
point. 

Under conditions of hot-electron flow (as in Appendix A) we will 
continue to use the Einstein relation 


kT. 


D(¥) = j u( ) (14) 





to relate the mobility and diffusion coefficients through the effective 
electron temperature, T,. This has recently been shown to be a 
reasonable assumption if it is recognized that, for mobilities signifi- 
cantly below po, the low field value, the electron temperature will be 
higher than the lattice temperature.’® Also, according to (2a) we will 
consider the term 


dD( ¥&) 
dx’ 
i.e., the diffusion constant is not a strong function of the electric field. 


The result for the saturation voltage obtained in Appendix A [cf. 
(82)] is 





D(¥&)dQ/dx > Q 


Lie a) kTe(Co+GP) a5) 


P= Vaxr = Vos ~ Ve (+54 ’ 
q C, 


R 

where F is the short-channel factor’® at the threshold condition and 
C, is the semiconductor depletion capacitance at the pinch-off point. 
Each quantity denoted by a bar will signify its value at the pinch-off 
point. From this result we can see that the charge at the pinch-off 
point now continues to grow as we move into the hot-electron regime, 
linearly with the electron temperature. Since we are using B = 2 and 
have shown for this case that T, is given by (7b), then we have 


_a-R) kTo|, , (2) 
Q= 7 (o.+ or) ly + (ZY |. (16) 


The value of &, the electric field at the pinch-off position, is 
determined from the drift relation 





ee) ios R) 
= eee re . (17a) 
Using (17a) and (7a) in (16) we obtain 
& 2 BP 2 = (1 fe al 
(Z) b+(Z) |= lima] om 
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which has the solution 


8 (fi fra-RP\” a 
2-({i+| Wu.Qo I} “Dp? me 
where 
Gy Es co geen) 
R q 


is the value of charge at the pinch-off point in the absence of hot- 
electron effects.!® This is an interesting result because it encompasses 
all modes of operation from the region of low currents without velocity 
saturation through the regime of high fields dominated by velocity 
saturation. For low currents we can expand the square root and 
(keeping only the first term) find that, without velocity saturation, we 
have 


|= I(1-—R) 
¢= ———_. 
Wu0Qo 


If the current (and therefore the channel field) is high (velocity 
saturation becomes dominant), then we have 


, \ /1a-D¥& 
— Wid” ve 


so that the field changes from a linear to a square root dependence 
upon the device current. We note that from (17d) and (17e) the 
transition from one to the other takes place at & = &, as we would 
expect. We can also determine from (17c) that the device current that 
flows when & = &is 


(17d) 


Wv.Qo 
I = V2 va.) ene 
(1 — 2) 
The interesting feature of this result is the independence of channel 
length. We should, thus, expect devices of all gate lengths to change 


to velocity-saturated behavior at the same value of channel current. If 
we now use (17c) in (16) we obtain 


~_(-R) a RTo[1, \ /1, PO — RB 
Q =F (Gs + C.F) IF + oo, (18a) 


and again we can obtain the limiting cases 


Q = 





and 
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_At= R) 
~ Wo, 


for low and high currents, respectively. A useful approximation to 
(18a) is found to be 


Q (18b) 


bes de (1 — RY} 
Q@ = Q 1+ fone | , (18c) 
or 
oi TER) 


expressions, which are mathematically more convenient. The approx- 
imation will introduce some error when the two terms under the root 
sign are equal, but it yields the correct value in the extremes. Using 
the saturation voltage (15), we may now determine the saturation 
current; using (8) and (11a) we obtain for the cases of B = 1 and B= 
2, respectively, 


woCoW | (Vas — Vr)? — (Q/C.)? 
Isar = ——— | — SS OWI: B=1 (19 
sas 2y | ie aa Q/C. a 
BY 
and 
Tsar = Wv,C, 


, B=2, (19b) 


a a 


| (3) , Wes = Vr)? = (Q/C.)? _ “ 
a 


where jy is the position of the pinch-off point in the channel. Since we 
do not have the equivalent to (6b) for the case of B = 1, we cannot 
determine T, or, therefore, Q/C,; thus, we can proceed no further with 
that case. 

Consider the case of B = 2. By substituting (18c) into (19b) and 
rearranging, we find 


(Lev.CW)  (wCW)?[ ve (OT _ 
+2 Ga) I- (i +a) ves Vr) -(2)|=o. (20a) 


which has the solution 
Isat = W,C, 


{(2) , Vos — Vr}? - eT" 7 2), (20b) 


a* a* * 
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where 
a*=atl. (20c) 


By comparison with (11a) it can be seen that the expressions for the 
current in the triode region and saturation region are very similar. 
The currents are identical at the pinch-off position because the change 
from the parameter a to a* exactly compensates for the change from 
Q/C. to Qo/C,. In using (18c) in (19b) we have omitted the factor 
(1 — R). We must do this to obtain consistent results because (19b) 
was derived on the basis of drift alone. 

As in the case of the triode region, (10b) can predict two limiting 
cases. For small gate voltages, the appropriate expansion of the square 


root yields 
C,W oN 
[= a Ves — Vr)? - (2) | ; (21a) 


as we would expect in the absence of hot-electron effects. On the other 
hand, in the extreme of large gate voltages we obtain the result 





[I = =~ (Ves — V7). (21b) 


An interesting feature of this equation is the absence of y and therefore 
the absence of channel-length modulation effects. The interpretation 
is that for high enough gate voltages and/or short enough gate lengths, 
the velocity of carriers at the source approaches the saturation velocity. 
In this situation the current is determined only by mobile charge at 
the source, which is independent of gate length and depends only on 
the gate voltage. Although we can approach this situation, we could 
not achieve it in practice because of the onset of breakdown and 
punchthrough effects. 

Returning to (15) and using our approximate value of Q from (18c), 
we find the saturation voltage to be 


V = Vsat = V ~ vp — & 1+ 1) (22) 
= Vsat = Vas a G. Wo.Qo) ° 





Using (20b) in (22), we then find 
~\2 _\2 
1 SE. 1 
Var = Vos ~ Ve~ |(&) (.-4)+2(2 ar 
2 ey g 271/2 
-(Ves — Vr)? - ae \y (2) +2 + — (Ves — Vr)? (2) . (28) 
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This result is almost the same as one obtained in the literature using 
a piecewise continuous constant mobility and velocity-saturated 


model; i.e., from the continuity of current at a field & in the channel 
one has 





WwW SAT 
Ho C.|(Vas — Vr) Vsar — 9 


= u;C,W(Ves — Vr — Vsar), (24) 


Vv, = 8x 108 cm/s 
L=1.5 um 

tox = 500A 

Hg = 600 cm2/V-s 


Vear(V) 


EXACT 
————In APPROXIMATE 
moe wm PIECEWISE LINEAR APPROACH 





0 2 4 6 8 10 12 14 
Vg-VzlV) 


Fig. 3— Variation of Vsar with gate voltage showing the complete solution using B = 


2, the approximate solution using B = 2, and the piecewise linear constant mobility 
result. 
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which may be solved to yield 
Vsat = Ves — Vr + &9 — V(Ves — Vr)? + (&y)’. (25) 


These two results are compared in Fig. 3 and it is seen that over much 
of the range of velocity saturation, (25) is rather inaccurate. The use 
of (25) to predict the current can then lead to errors of about 20 
percent, as we show later in Fig. 9. By using (23) we obtain a good 
approximation to the exact result [the form used for a is described 
later by (33a)]. The differences between these two curves arise simply 
from the errors introduced in making the expansions to solve the 
original current equation (10a). The only drawback to (23) is that 
some iteration is required at the transition from the triode to the 
saturation region since a* is a weak function of Vsar through the 
logarithmic term. As we can see from (10b), this voltage enters only 
weakly into the current itself through a*, but is necessary to determine 
where the saturation and triode regions meet and to provide an 
accurate merging of the two. In the calculation of [sar it should be 
noted that it is not necessary to calculate (23). Since the current is 
given explicitly by (20b), then (22) can be used to find Vgar. It would 
also be possible to treat a as a constant parameter to be determined 
so that no iteration is required; however, it is shown later that the 
variation in a is important in achieving continuity of gm. 

In line with the reasoning that led to (24) it is of some interest to 
consider the field & as a kind of dividing line in the channel between 
the non-velocity-saturated and the velocity-saturated portions. Using 
(9a) 


V2I 
v;C, W 





Vas — Vr - Ve= (26) 


is the condition in the channel when a field of & = & is reached at a 
channel potential of V = Vc. We can then use (10b) to give Vc in the 
triode region, which is 


V24L 





Vo = Vas -— Vr t+ 


V28L 








» (27) 


where a is given by (11b) and is a known function of Veg and Vps. We 
can then use this result in our expression (11a) for the current to find 
the position in the channel at which this field is achieved; i.e., using 
(26) in (11a) yields 


2 2 
- (Ves — Vr)? - os (Ves — Vr — Vps)? + ( 
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yo = V2 


Ves — Vr —- Ve)’ 
Watt aiveee te eS 


SE Ves — Vr — Ve) 


where V¢ is given by (27). If we follow the same procedure for operation 
in saturation then we obtain from (20b) and (26) 


Ves — Vr - Ve 


wes 2 ~ \27) 1/2 W285 
{EE +2[0-v9 QP 


which is similar in form to (27) but now depends on y. The solution 
(28) applies to saturation or triode operation. Consider the situation 
ye = 0, which implies either that the carriers are subjected to a field 
> at all points in the channel or that they are moving with a velocity 
of v,/ V2 at the source end-of the channel. Using (28) to find Vas — 
Vr — Vc and (11a) for J we find 


Wo, C 
= s V, 
Weare is a> 


i.e., we are very close to the velocity-saturated limit of (21b). It is then 
of some interest to determine the applied voltage for which this 
condition is achieved as a function gate length. Using (20b) we find 


(Ves — Vr) = 2&IVa* + 1. (30) 


This voltage value is plotted as a function of gate length in Fig. 4 
using the values of a* as determined by (33a) and (38b) (and shown 
plotted in Fig. 9). For L = 0.5 wm this condition is not achieved until 
3V are applied to the device. These results simply point to the fact 
that the use of L/v, to determine transit times usually gives values 
that are too small by a factor of about 2. 


(28) 








Vr), 


2.3 Discussion of results 
2.3.1 I-V curves and current saturation 


The interpretation of the phenomenon of the saturation of the drain 
current presented here is the following. Classical saturation of the 
current occurs when the charge in the channel has been reduced to 
the minimum value given by (16). This condition always occurs in the 
channel just before inversion in sign of the transverse oxide field 
occurs; i.e., pinch-off occurs at a channel potential given by (15), and 
when this potential has increased by another Q/C, volts, we have zero 
voltage across the oxide, resulting in field inversion. Therefore, pinch- 
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(Vg — Vz)'(V) 


L(ym) 


Fig. 4—Variation of (Ves — Vr)’, the gate voltage for which the source velocity is 
v,/V2, with gate length. 


off in the channel will always be associated with a field inversion in 
the oxide. Now in the case without velocity saturation the voltage 
Q/C, is typically 0.05V, which is dropped over a fairly short distance 
(of the order of 0.1 um) in the channel, so that it is clear that pinch- 
off and field inversion occur at almost the same point in the channel. 
When velocity saturation becomes dominant, Q/C, becomes a sizeable 
term compared to Veg — Vr because of the dependence of Q upon the 
electron temperature. This is the reason for the apparently lower drain 
saturation voltage in the velocity-saturated regime, as we can see from 
(15) and the plots in Fig. 3. Under these conditions the voltage drop 
in the channel between the pinch-off point and the field inversion 
point, and the distance between these two points may grow consider- 
ably from the constant mobility case. 

The parameter a requires more attention for the computation of 
accurate results. From the results (12) and (21a) it is clear that a is 
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an important parameter only in the limit of velocity saturation. 
Consider the value of a at the saturation voltage for which case it has 
its greatest effect. Taking a in its complete form from (10c) we have 


a 
Ves — Vr+ (Ves — Vr) \/ er aa 


ceh | ————— > >? CU CU 
O/C, + G/C, ie) 
where we have used (18c) in the denominator and also the fact that in 
the limit of velocity saturation we have from (20b) or (11a) 
I Ves _— Vr 


W2,C, = VvVlt+a 


From this result we see that in the limit of a saturated drift velocity 
if a < 1, then we should be able to express a as a function of only 
(Ves — Vr)/(Q/C,). The result we obtain should be useful over the 
whole range of operation because, although determined from velocity- 
saturated conditions, a disappears from the expressions for the current 
when velocity saturation becomes unimportant. We have found by 
iteration that the result 





(32) 


2 Vege Ve 
a = 1.24 In poe VV, = =) (33a) 
in the triode region, which goes to 
Ves — Ys) 
a = 1.24 In | ——.— (33b) 
( Q/C, 


at the saturation voltage, gives excellent agreement over a wide range 
of parameters. The factor 1.24 was chosen to match the approximate 
and exact formulas for the current well into velocity-saturated opera- 
tion both in the triode and saturation current regions. It was not 
chosen to match the exact and approximate forms of (81) itself so that 
it could be used in a more general way to compensate for all errors 
involved in the expansions in (10b). The corresponding value of a* is 
found by using (33b) in (20c). 

These calculations are shown for some typical devices in Figs. 5, 6, 
and 7 over a wide range of gate voltages, channel lengths, and gate 
oxide thicknesses. The approximate and exact predictions of the 
saturation voltage are also shown. It is noted that the logarithmic 
dependence of a upon drain voltage in (33) must be included to obtain 
an accurate result. The current therefore is a totally explicit function 
of the device voltages; the only iteration required is in the calculation 
of Vsar, as we mentioned earlier. As the figure indicates, the approx- 
imations are very good. 
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Vg - Vy(V) 


Fig. 5—Variation of the voltage parameter a at saturation with gate voltage for 
various gate lengths. 


Figure 8 shows the conventional square law result. For the 1.5-um 
device, for example, the effects of velocity saturation are relatively 
unimportant for Vgg = 2 but become progressively more significant as 
the gate voltage is increased, so that for Vcgg = 6 there is a great 
difference between both the current and the saturation voltage. It is 
clear how the effects of a saturated drift velocity have reduced the 
current available from the device for a given supply voltage. It is for 
this reason that increasing the supply voltage has very little effect on 
gate propagation delay beyond some particular value of voltage. From 
(20b) we see that there is a significant departure from the square law 
when 
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Vop = Vr + — ~Vr+ £L 34 
DD T Via Or T ( ) 


MOSFET 1343 


—-—APPROXIMATE USING a = 1.24 In(Vg—Vz/Vg—Vp—Vp) 


———EXACT USING &, = 1.7.x 104 V/cm 
Hg = 600 cm2/v-s 


Tpglamp x 10-3) 
Ipg (a x 10-3) 





Fig. 6—Comparison of the rigorous and approximate solutions for a wide range of 
oxide thickness gate length and gate voltage. The voltage Vsxr is shown by the arrows 
that indicate the end of the line. 


for typical values of a. We should, therefore, like to design for a supply 
voltage of approximately (34) since we would then have the maximum 
speed and the minimum power dissipation for a given channel length 
of the driver in an inverter for example. However, in practical appli- 
cations of short-channel devices the supply voltage will be higher than 
this value to ensure adequate noise margins and a sufficient ratio of 
Vpp: Vr due to processing tolerances on Vr. This will definitely be the 
case if Vr is increased intentionally in order to suppress subthreshold 
leakage or if the supply voltage must be held arbitrarily at 5V to 
provide transistor-transistor logic (TTL) compatibility. 
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Fig. 7—Comparison of device characteristics for L = 1.5 wm using rigorous and 
approximate forms. Also, shown for Vg = 6V is the result of treating the In term as 
constant. 


The other point of interest in Fig. 8 is the comparison of the results 
obtained using the piecewise-linear model of the velocity-field char- 
acteristic and the more physical model presented here. The discrep- 

-ancy (indicated by the arrows in the figure) grows at first quickly and 
then less rapidly as the gate voltage is increased; i.e., for lower gate 
voltages such that Ves — Vr ~ Vgar the percent error in the current 
will be greater than that for Veg — Vr >> Vgar. This is shown by the 
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Fig. 8—Comparison of velocity-saturated model for B = 2 with constant mobility 
model. Also shown is the piecewise constant mobility model denoted by circles at 
saturation. 


plot in Fig. 9 for the device in Fig. 8. The maximum in this curve is 

expected since the errors should be worst when the field at the drain 

is & because then the differences in the models are at maximum. 
2.3.2 Substrate bias dependence 


In the triode region the substrate bias totally enters through the 
threshold voltage as the body effect. It is also known that the threshold 
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Fig. 9—Error incurred using the piecewise constant mobility model as a function of 
gate voltage. 


voltage is dependent on drain voltage in a short-channel device and 
this dependence is increased by the application of substrate bias. 
These effects will be considered in the discussion of experimental 
results. 


2.3.3 Effects of contact resistance and a gate-voltage-dependent mobility 


In a real short-channel device, unavoidable series contact resistances 
can introduce great differences between actual and expected device 
currents. Also the mobility parameter, uo, exhibits a gate voltage 
dependence due to the increase in surface scattering with gate bias.”° 
We will consider both of the effects here since, although unrelated, 
they affect the device current in the same way. We will consider a 
series resistance, R, in the drain and the source leads as shown in Fig. 
2, and a mobility dependence of 
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Hoo 


“1b VesS Va). 


The parameter 6 is an empirical constant that determines the depend- 
ence of the mobility upon the normal channel field” and is thus 
expected ‘to have some substrate bias dependence. With these addi- 
tions, one can show from (10c) that the triode region result is modified 
to 


Ho (35) 


Typ 


Wu,C, 













BLL) 2K Vos 


ar) 
a a 2 


f 4, Ves = Vr)? = (Vos = Vr - Vos)? 
a 





StL  2R’ V; 
[RAE (ve -*2)) a0 


and the saturation region result is modified to 





(* , Ves = “| , Was = Vr)? = (@o/Co)* 


Co 2 a 
where 
R’ = RWv,C, (37b) 
Se = & [1 + (Ves — Vr)] (37c) 
and 
a’ =1+at 2RWv,CG. (37d) 


In writing (37a) we have used both (18c) and (18d) to express the 
linear and square root terms in Q/C, that appear when (22) is substi- 
tuted in (36). Therefore, we cannot expect a perfect match between 
(36) and (37a) at the saturation point, although it will be close. To 
achieve identical values we would have to use (18c) only and then 
iterate (37a) to determine Igar. 

The effects of the series resistance and the mobility. reduction are 
similar since they both act to increase the first term under the square 
root sign in (36) and (37a). From (36d) it is seen that the mobility 
degradation can be interpreted simply as a movement of & to higher 
values on the velocity field curve, as shown in Fig. 10. This has been 
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E(V/cm) x 104 


Fig. 10—Variation of & as gate voltage is increased because of the reduction in pu 
shifting the point of velocity saturation to higher & 
noted elsewhere for the constant mobility case® and can be generalized 
to any velocity-field relation. 

It is interesting to consider the limiting forms of these two results. 
From (36) by expanding the square root term in the limit of small gate 
and drain voltages, we obtain 


1 2(Ves — Vr) Vps — Vis 


[=- (38) 


EL + 2R’ (Vs ~Vr- Be) 


If we now consider the case of small drain voltages, which describes 
the region in which we normally assess the dependence of uo upon 
Ves — Vr, we obtain 


2 HooC, W (Ves — Vr) Vos 


I 
Hoo C, W 
— 


(39) 


1+(0+2R (Ves — Vr) 

From this result it is clear that one must be careful when extracting a 
_ physically meaningful value of 6 from experimental data since an 
accurate value of R must first be known. This can be shown clearly by 
the data in Fig. 11 of the linear region current of three devices that 
are identical except for the gate length. Each curve is characterized by 
a section at lower gate voltages, which is linear, and a section at higher 
gate voltages, which is nonlinear in gate voltage. The gate voltage at 
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Fig. 11—Linear region current data showing the effect of R and 6. The experimental 
device parameters are t,x = 500A, N4 = 1 X 10°® cm™, and joo ~ 560 cm?/V-s. 


which the sections join is marked in the figure by an arrow. The most 
important feature of this voltage is that it moves to lower and lower 
values for decreasing L. From (35) we see that this breakpoint should 
be fixed in gate voltage—the reason for the movement to lower values 
is the increasing importance of the term in R in (38) for decreasing L. 
If we use 6 = 0.035 and a value of R = 900, then we can fit the curves 
as indicated. To give some idea of the importance of R, for the long- 
channel device the choice of R is immaterial within limits since 6 is 
dominant. For the smallest length (ZL = 0.5 um) the choice of 6 is not 
important because the R term is dominant. Thus, as channel lengths 
are reduced it becomes increasingly important to reduce the series R. 
Both the contact resistance and the mobility reduction will cause Vsar 
to increase for a given Vg. 

Figure 12 shows the effects of a contact resistance of 500 on the 
device current for a typical short-channel device. To write down a 
closed form for J, we have ignored the dependence of a upon R in 
arriving at (86) and (37a). The difference this makes is indicated in 
Fig. 12. 
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Fig. 12—Effect of series resistance on drain characteristics of a short-channel device. 
The curves are calculated using (36) with R = 0 (solid line) and R = 5052 (dashed line). 
The second dashed line shows the effects of neglecting any effect of a upon R. 


2.3.4 Incremental parameters and the continuity of derivatives 


2.3.4.1 Current-saturated operation. Since the current as derived in 
(20b) is a continuous function of gate voltage in moving from non- 
velocity-saturated to velocity-saturated operation, then the same is 
true for transconductance, and we obtain 
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Similarly, the drain conductance is 
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We could also write these equations with a* everywhere replaced by 


a and @)/C, everywhere replaced by Q/C,. In that case, in the numer- 
ator of the square root terms we require the additional terms 


= Q sae Q 
= OC, Z_ (2) and = QC, Vos ¢_(2) 
in Zm and gq, respectively. 


In these equations, y is a function of both the drain and gate voltages. 
The parameter a* is related to a by (20c), and in the pinch-off condition 
fusing (22) and (11b) in 20(c)] a* is 

Ves — Vr 
Qo Z 
C, * Wu,C, 








* dVes 


a*=1+1n (42) 





Now the value of a and its variations are unimportant for low values 
of Ves because the conventional device equations (i.e., without velocity 
saturation) apply and the forms for g,, and gg are well known. On the 
other hand, for operation in velocity saturation we have shown that 
[cf. (32)] I’ is a linear function of gate voltage and so (42) will be 
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relatively independent of gate and drain voltage. We can, therefore, 
neglect the a* terms in (40) and (41) and use the value of a* that we 
obtain at the pinch-off point to obtain a fairly good approximation. 
However, in comparing theory with experiment for accurate results, 
as we shall see, we must keep the dependence of J in (42) and calculate 
the current by one or two iterations. Some interesting features of (40) 
and (41) are worth noting. In the limit of large gate voltages (i.e., 
velocity saturation) so that Ves — Vr > &y, (40) predicts that 


Em = a, (48) 
and (41) predicts that 
&a = C,Wo, 
(Sy)? 1 1 dy 1(Ve- Vr) da* 1 dVr 
le ae ee ee Ee 


2.3.4.2 Triode region. In the triode region the corresponding expres- 
sions for the gate transconductance and the drain conductance are 


&m = Wo, C, 
(2) dg rm (Ves — Vr) — (Vas — Vr — Vps) 


aja a 
_ [(Ves — Vr)? — (Ves — Vr — Vos)"] dic 
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EL as (Ves — Vr)* — (Vas — Vr — Vos) 
a a 
Gi. . 
+ 7 AG 
a 
and 
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CL ap Ves — Vr —- Vos 
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=~ [Wes'= Vr)? — (Ves — Vr — Vps)?]ap _ Vos dVr 
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EL 
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In this case the terms in a are necessary to obtain a reasonable solution 
and so from (33a) we have 


da eats 1.24 
dVps- Vga — Var— Vos’ 
da ; 1.24Vps 





dVes a (Ves — Vr)(Ves — Vr — Vos)” en 
Again, as in the case of the saturation region, the formulas revert to 
their conventional forms for small values of gate and drain voltages. . 
The main interest in considering the incremental conductance param- 
eters is to show that continuity exists in making the transition from 
triode to saturation regions. We would like to show equivalence be- 
tween (45) and (40) and between (46) and (41) at pinch-off conditions. 
By comparing (46) and (41) (written in terms of a and Q/C,) we find 


(Ves - Vr-Vns)_ Q d (2) - dy ( I 


a ~~ aC, dVps \C,/ dV a \ Wr. C, 





or 








* dVps \Wu,Co 


Since the derivative on the left-hand side of (48) < 1, then (48) can 
be written as (173). As we show in Appendix A from (174) through 
(177) the derivative continuity equation (48) supplies the other con- 
dition needed together with (177) that allows determination of the 
parameters A, and h. 

To illustrate these results, the drain conductance has been plotted 
in Fig. 13 with drain voltage as a parameter. The accuracy is quite 
good and continuity of the derivative is preserved in moving from the 
triode to the saturation regions. 


Q/C. [1+ 72 @icn]=- « o ( : ). (48) 


2.3.5 Inclusion of the Vps dependence of V; 


2.3.5.1 Triode region. In the triode region the charge-sharing 
formulation’® is used in which V7 is expressed 


2v 2eqNa 
3C, Vos 
-[(Vps + Vas + 26r)?” — (Vas + 2¢r)*”]Fon, 


Vr = Veg + 2¢7 + 


where Fon is the charge-sharing factor in the triode region. Although 
this result may be written approximately by expanding (for small Vps) 
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Fig. 13—Comparison of approximate and rigorous forms of the drain conductance. 


as 


Vr = Veg + 2¢6r + a \/ 2eqNa (Vs 5 a + 20) Fos 


it is important to note that the complete form must always be used 
for circuit simulation purposes where discontinuities may not be 
allowed between the triode and saturation regions. The complete form 
provides a smooth transition for all voltages, whereas the approximate 
form introduces small glitches for some values. 

2.3.5.2 Saturation region. In saturation operation the complete and 
approximate equations for V7; become 


MOSFET 1355 


2 V2eqNa 
Vr = Vrs + 2¢r + 3 C.Vean 


-[(Vsar + Vas + 2¢r)*? — (Vas + 267)*?"Isar_ (49) 


and 
1 
Vr = Vrs + 2¢r + C v2eqNa(Vsar + Vas + 2¢r)F sar. 


As in the triode region the complete form (49) must be used to avoid 
glitches at the transition regions. This result was developed previ- 
ously.!® The difference in this case is that Vsar also depends upon the 
charge at pinch-off and thus the current through (22). Therefore (22), 
(19b), and (49) must be solved together to determine the current. To 
obtain results that are smooth through all transitions, iteration is 
required. 


2.3.6 Effective value of Na, 


Almost all practical devices have ion-implanted channels and hence 
nonuniform doping profiles. Since the short-channel formulation de- 
scribes the device in terms of averages over the source-drain distance, 
a reasonable approach is to use an average value of N4 = Ny over the 
same distance. We can calculate this value by conserving charge in 
the manner 


Naxa = f Na(x)dx, (50) 


where xg is the average depletion width between the drain and the 
source and has been shown to be?é 


Ze \ / V; 
Xq = oN, Vas + 2¢r + om : (51) 


We can see from this result that the effective value of Ny is in general 
a function of both drain and substrate bias. 

To determine N, we will assume that any ion-implanted profile may 
be described by the parameters N,, o, and R,; i.e., even after thermal 
processing it is assumed that N,, o, and an R, may be found to give a 
best fit to the measured impurity profile so that 


z 
Na(x) = (Np — Nas)exp -() | + Nas, (52) 


where Nag is the background impurity doping. Using (52) and (51) in 
(50) we then have 
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for the saturation region. If we use a polynomial expansion for the erf 
term to allow numerical computation, only a few iterations are required 
to obtain N,. For the most accurate results, then, a value of 
N,(Vps, Vas) is required at each bias point, although in practice only 
the Vgs dependence is very important, so an average value of Vps 
could be used for all computation. 


Ii. COMPARISON WITH EXPERIMENT - 


To establish the validity of the parameter B = 2, the VI versus Ves 
data were plotted in Fig. 14 with drain voltage Vps = Ves for the long- 
and short-channel devices discussed in Fig. 11. Using all of the 
parameters determined from the linear region data, theoretical curves 
were generated and are also plotted. The drain-voltage dependence of 
Vr was included using the charge-sharing technique,!® and the varia- 
tion of y was taken from the drain-voltage data in order to verify the 
triode region model alone. It is evident that the agreement is very 
good. To compare with the other models, the result of using B = 1, 
and the piecewise continuous model (i.e., constant mobility) are also 
shown. Both are in error, one underestimating and the other overes- 
timating the actual current. 

One can identify two sections in this curve. At low voltages the VI 
curve is linear in Vgg, indicating long-channel behavior; for higher 
gate voltages the VJ curve is sublinear in J, indicating the onset of the 
effects of velocity saturation. The change from one behavior to the 
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Fig. 14—Saturation region data for long- and short-channel devices plotted on a VI 
scale for the condition Vps — Ves. 


other is gradual and takes place when the terms under the root sign 
in (20b) are approximately equal; i.e., at a current level of about 


(Ves — Vr) 
va® 
This result should be independent of gate length, as we can see from 


(17c), which predicts the transition from # < &to ¥Y > &to occur 
for a unique value of J, which is 


I=v,WC, 
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pee 

(1 — R) 

This behavior is observed in Fig. 14, although there is some decrease 
for the very shortest gate length. 

As a final demonstration of the model the drain current data for 
two devices are shown in Figs. 15a and b. The channel length modu- 
lation term can be calculated using (169). However, this description is 
only valid for large gate voltages, i.e, when & is 210* V/cm. As we 
approach the threshold condition, the derivation breaks down and 
L — y becomes anomalously large. However, near the threshold con- 
dition and in the subthreshold region, we know from other work that 
the simple form of 


\ / 2 
L-y= on, (Vos ~ Vsat) 


works well. We will, therefore, combine these two results to obtain a 
continuous solution 


L-3 Ai(Vps — Vsar)*/4 
ay : 
qNa 


E+ Ai(Vps — y)i4 9 
€ 


In Fig. 15a the fabrication parameters are L = 0.32 wm (1-um coded 
length and 0.68-~m compensation as determined from 1/g,, measure- 
ments), W = 30 um, tox = 230A, r; = 0.26 wm, and Nag = 8 X 10% 
cm~°*. The implanted doping parameters were determined by simula- 
tion to be N, = 5 X 10°° cm™, R, = 0.5 um, and o = 0.1 um. Other 
parameters determined from low voltage data are R = 15 through 200, 
to = 650 cm?/V-s and 6 = 0.02V~!. The agreement between theory and 
experiment is very good for Vgg > 1V. For smaller gate voltages the 
characteristic is dominated by uncontrolled source-drain punch- 
through, which is not described by these models. 

In Fig. 15b the parameters are the same but L = 0.85 um, so in this 
case the comparison is extended down to the threshold region (because 
source-drain punchthrough is not a limiting factor). 


IV. SUMMARY 

A model has been developed to describe the velocity-saturated 
characteristics of short-channel MOSFETs. The model has been based 
upon the velocity-field relation that most nearly fits the experimental 
data, and it is found that the bulk relationship based upon optical 
phonon emission is most appropriate if the zero field (parallel to the 
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Fig. 15(a)—Drain-voltage versus drain-current data for a short-channel MOSFET 
showing comparison with theory. L = 0.32 um. 


surface) mobility is simply modified to take into account the effects 
of increased scattering in the potential well at the semiconductor- 
insulator interface. By using this approach, the electron temperature 
has been brought into the problem and becomes the necessary ingre- 
dient that allows a smooth transition from the long-channel to the 
short-channel behavior. The model is found to fit well with experi- 
mental data. 
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Fig. 15(b)—Drain-voltage versus drain-current data for a short-channel MOSFET 
showing comparison with theory. L = 0.85 ym. 
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APPENDIX A 
A.1 Introduction 


In existing analytical approaches to pinch-off operation, the field at 
the edge of the pinch-off region is held to the value &, the critical 
field parameter in the velocity-field relationship.®'””" To avoid such 
an arbitrary condition, a description is presented here of the pinch- 
off region under conditions of hot-electron transport, which is based 
upon the boundary condition of a value of the electric field at the 
pinch-off point determined uniquely by the current and hence applied 
voltages. The field patterns and charge density distribution throughout 
the pinch-off zone are predicted. 

The solution allows one to determine the effect of the gate voltage 
on the channel field at pinch-off through the channel current itself. 
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Hence, there is direct feedback between the channel current and the 
extent of channel length modulation. It is demonstrated that for low 
gate voltages the channel length modulation may be severe and the 
current then shows a large variation with drain voltage. However, for 
larger gate voltages, electrostatic feedback to the channel becomes a 
dominant effect, and the current shows little variation with drain 
voltage, i.e., the degree of channel length modulation becomes very 
small. 


A.2 General considerations and assumptions 


It is helpful initially to review the important approximations and 
salient results that are obtained from simplified long-channel theory. 
For the purposes of the discussion, Fig. 16 shows a cross section of the 
device, which indicates the major current flow patterns and electric- 
field lines. 


A.2.1 Existing theories 


A basic assumption of MOS theory is that the gradual channel 
approximation (GCA) is valid throughout the unsaturated region. In 


Vg 





Vg 


Fig. 16—-MOS device cross section showing schematically the major lines of current 
flow (dashed arrows) and electric-field patterns (solid arrows). 
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a physical sense this approximation assumes that the transverse 
electric-field strength, d¢é/dx, is large compared to the longitudinal 
electric-field strength, d¢/dy, in the channel region (¢ is the electro- 
static potential, x is the direction perpendicular to the gate, and y is 
the direction parallel to the channel). Hence, the charge Q(y) in the 
channel may be determined by 


Q(y) = Co [Ve — Ves — 2¢r — V(y)] — Qa(y), (54) 


where C, is the oxide capacitance, Vrs is the flatband voltage, V(y) is 
the channel potential, and Qs(y) is the substrate depletion charge. 
The substrate charge is determined, using the depletion approxima- 
tion, to be 


Qa(y) = v2eqNa[V(y) + 2¢r + Vpsl, (55) 


where N, is the substrate doping, q is the electronic charge, ¢ is the 
silicon dielectric permittivity, and 


_ kT, (Na 
b= Dn (%4), : (56) 


The bulk Fermi potential is written in terms of n;, the intrinsic silicon 
concentration. Another important assumption of the theory is that 
drift is the dominant conduction mechanism so that the continuity 
equation may be written 


I= WuQ(y) €(y), (57) 


where J is the source-to-drain current, W is the device width, yu is the 
channel mobility, and &(y) is the channel field. The channel current 
is obtained from (55), (56), and (57) to yield the familiar equation 


uC,W 


ee; 


Vos 
(Ves - Vrs — 2or _ vos) Vos 
_ aN 2v2eqNa 


3C 11 Vo + Vas + 2¢r)*”? + (Vas — 26r)*/ + . (58) 


The phenomena of current saturation and pinch-off in the channel 
are predicted to occur when the free charge in the channel goes to 
zero. From a solution of (54) for 





Q(y) =0 (59) 
one obtains 
Vsat = Ve — Vrs — 2¢r 
eQNa}, _ 2C3(Vc — Vrs + Vas) 
C2 F 1+ aN. = | (60a) 
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The corresponding value of the saturated current is then found by 
substitution of (60a) into (58), and in the simple case, this procedure 
yields 


C,W 
Isar = nee (Ve - Vr)’, (60b) 


where V7 is the simplified threshold value. In the general case, the 
more complex drain-voltage-dependent form of V'® must be included 
in (60b), but the result is still evaluated in a straightforward manner. 


A.2.2 Device characteristics in the saturation region 


For voltages Vps > Vpsar, the drain depletion region extends 
towards the source. The position of pinch-off in the channel, occurring 
initially at the drain for V(y) = Vpsar, will also move towards the 
source, since the condition (59) of zero charge may always be found 
somewhere in the channel for a function like (54), which is decreasing 
with voltage. Because of this feature, the potential at this position will 
remain constant for increasing drain voltage. 

The use of (59) in the channel in saturation is, of course, an 
approximation that is used solely to determine the drain saturation 
voltage. Actually, the electron density is decreasing rapidly with dis- 
tance in this region, and if we allow it to become arbitrarily small, 
then the electric field will have to become anomalously large if the 
drift component is to continue to provide continuity of current. We 
must therefore conclude that the drift component can no longer 
account for the total current flow and hence that the diffusion of 
carriers becomes an important conduction mechanism. This conclu- 
sion is supported by the numerical computations of other authors.!»"” 
We show here that diffusion constitutes a specific fraction @ of the 
current flow at the pinch-off point, and in Appendix B values for B 
are derived in the range of one third to one half. 

As a basis for this work, we assume that we may represent the MOS 
device by four different regions of operation, as shown in Fig. 17. The 
quadrants are divided on the vertical axis by the subthreshold and 
above-threshold regions of operation and on the horizontal axis by the 
position of pinch-off in the channel. The quadrants are distinguished 
by the approximations designated in the figure, which denote for each 
quadrant the dominant conduction mechanism and the principle di- 
rection of the electric-field lines. In quadrants II and III the GCA 
applies, which may be stated 


d’g _ d*¢ 


ae > ay? : (61) 
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Fig. 17—Regions of operation for a MOSFET characterized by the approximations 
that may be used. 


However, in quadrant I neither of these approximations is valid since 
the two terms are of the same order, which we write 


d?4/dx? ~ d2¢/dy?. (62) 


The validity of (16) has been well established in MOS theory; the 
validity of (62) will be justified here by the solutions obtained. The 
other approximations indicated in Fig. 17 are that in quadrants II and 
IV we have 


Iprirr > Jpirr (63a) 
and in quadrant III we have 
Ipret < Iprr. (63b) 


In quadrant I we also have Ippirr > Ipirr, but near the boundary we 
have 
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Iprirt ~ Ipirr, (63c) 


which means that the components are comparable in size. The as- 
sumptions in quadrants II, III, and IV have been established by others; 
the validity of the assumptions in quadrant I and on its boundary will 
be considered here. 

In the discussion of hot-electron transport we will use the mobility- 


field relationship” 
Ho - 
. (vr =H, 
: (2) | 


where & is the electric field and & is a critical field parameter that 
is related to the low-field mobility yo by the relation 


w() = (64) 


Ho & = Us. (65) 
Also, the Einstein relationship will be assumed to hold; it states 
kT(¥) _ kT 
DUE) ul 6) = AD we 5, (66) 


where D = D(£&) is the field-dependent diffusion coefficient and T = 
T() is the temperature of the electron in the hot-electron regime. 
We will now discuss the conditions in regions I and II. 


A.2.2.1 Region II. The current flow is approximately one-dimensional 
(parallel to the surface) because the dominant transverse field confines 
the carriers to a narrow potential well next to the surface. This region 
is described by (54), (55), and (57). Although drift is dominant in this 
region, it is also of interest to calculate the diffusion component. From 
(54) the mobile charge gradient is 


dQ _ 
dy 


where C,, the space-charge capacitance, is 


= Su qNae 
Cm ON N/ cpa Ve) (68) 


and V = V(¥y). The electric field is obtained from (57) as 
I(1 — R) 
WuQ(y)’ 


where R represents the fraction of the total current carried by diffu- 
sion. By using (69), (67), and (54), the diffusion component is 


—(C, + C, ) (67) 


By) = (69) 
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RI = Ipiwr 


_DIMICh + OYA = RY) 
u(L)[Co(Ve — Ves — 267 — V) — VgNa2e(V + 2¢r + Vas)] 


A.2.2.2 Region I. As the diagram in Fig. 16 shows, the problem in 
this region becomes two-dimensional. In region II, the electric field is 
directed from the channel towards the gate but is steadily decreasing 
as the channel potential increases. Somewhere beyond jy, the pinch- 
off position, the field in the oxide is equal to zero, and beyond this 
position the electric field is directed from the gate into the silicon; 
hence, many of the field lines terminate on the drain electrode, as 
shown, since the drain potential exceeds the gate potential in this 
region of operation. Because the transverse field in region I no longer 
creates a potential well, the mobile carriers may flow away from the 
surface as they approach the drain, resulting in a two-dimensional 
current flow. 

The main features of this representation are depicted in Fig. 18, 
which shows an expanded view of region I. Consider the flow of carriers 
at some position y in region I. Because of the strong electric field 
indicated in Fig. 18, which is directed from the gate towards the 
substrate in region I, the electrons are forced away from the surface. 
Therefore, the density of electrons will be reduced at the surface. For 
sufficiently large values of x (i.e., deeper in the Si) the electron density 
must decrease again to its substrate value, and so we conclude that 
the electron density must exhibit a maximum as a function of x for a 
given value of y. We will, therefore, represent the electron density in 
a general way by a function of the form 


n(w, r) = N(r)exp{— A(r)[w — ws (r)]’ 
— B(r)[w — wi (r)ft — ---}, (71) 


where the higher-order terms in the distribution would be required for 
strong deviations from the normal case. The expansion for the electron 
density has been written for the generalized coordinate axes w, r rather 
than for x, y to allow for the fact that the x, y system may not be the 
most convenient one in which to describe the distribution. The y or r 
dependence has been incorporated into this form through the param- 
eters N(r), A(r), B(r), we (r), and w*(r). In this way, the median value, 
the mean position, and the width of the distribution may change with 
y or r. The functions w*(r) describe the locus of the mean of the 
electron distribution between the pinch-off point and the drain as 
illustrated in Fig. 18..In the approach taken here, we have only three 
physical relationships available to us for the determination of the 
electron distribution, so we will restrict (71) to three unknown func- 
tion: 1.e., 


(70) 
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Fig. 18—Expanded view of the pinch-off zone with important physical and electrical 
parameters. 


=e 2 
n = n(w, r) = N(r)exp \- oe (72) 


where A(r) has been rewritten as 1/207(r) and in which the functions 
N(r), o(r), w*(r) remain to be determined from three physical rela- 
tionships, which will now be described. 

The first relationship is the description of the current flow mecha- 
nism and quite it is 


Wr -[" DS de ef p&ndw (73) 
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for any y between y and L where w = 0 is the silicon surface. The 
second relationship is the law of current continuity. Physically speak- 
ing, the current flow in the x direction must provide the current 
density gradient by means of which the current density in the y 
direction is able to change. This condition is stated formally as 





V-J3 =0 (74) 
or, for the case being considered, 
dJ, _ _ addy 
dx dy’ 


where J, and J, are the current density components in the x and y 
directions, respectively. The third relationship is the two-dimensional 
Poisson equation 








V.¥=4 », (75) 
which is written here as 
TE pe a Nea (76) 
ox oy € 
subject to the boundary condition 
B=aF yy, (77) 


where ¥ is the pinch-off point and & is the value of longitudinal 
electric field in the channel at the pinch-off point. Both x and y 
components need to be retained in (76) since the concentration of 
field lines is comparable in both x and y directions. The value of &§ 
along r* will be discussed later. 

In the section of the channel between the source and the field- 
inversion point, (64) is a one-dimensional relationship between the 
channel field and the mobility. Between the field inversion point and 
the drain, the mobility becomes a scalar field determined by the vector 


P=rE+oF. 
Since 

Sr = Ci+ £2, 
(64) becomes 


Ve] *\e 


Also, the one-dimensional hot-electron temperature 
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c 


we\ 
anv (SZ) a 
may be expressed 


ca sg’ 
Be = Bo + Bo (4) + Bo (2) ’ (78c) 
which is a scalar field, where 6. = kT./q. Therefore, the mobility (78a) 


may be expressed in terms of the temperature T, or, equivalently, the 
voltage 6, using (78a) and (78c) as 


B_ Bo 
- \/é. (78d) 


A.3 Determination of physical parameters in pinch-off operation 
A.3.1 Charge, voltage, and field at the pinch-off position 


The boundary between regions II and I will be defined as the pinch- 
off position, and all variables at this point will be designated with a 
bar. As the position ¥y is approached from the source, the drift current 
will be a decreasing function of y and the diffusion current will be an 
increasing function of y. At y, the drift and diffusion components of 
the current are 


Iprrt = (1 _- R)I (79) 
and 
Ipir = RI. (80) 


The method for the determination of R is outlined in Appendix B [see 
(200)]. It is determined only by doping and oxide thickness and has 
typical values of one third to one half. From (70) and (66) we have 


Co(Ve — Veg — 2¢r — V) _ VqNa2e(V + 2dr + Vzs) 


= 48 [C.+ C(9)). (81) 


Equation (81) may also be written approximately as 


Vie VS Vein (4) we ieae +01). 
The relationship (81) is identical to that obtained in the original MOS 
theory” except for the extra term on the right-hand side. In other 
words, rather than using (59) to determine the pinch-off voltage, we 
are taking account of the charge in the channel at pinch-off, and from 
(81) and (54) the charge is determined to be 


(82) 
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Q= ead) =" (C. + G)). (83) 
q 


For T' = To (the lattice temperature, as is the case in long-channel 
devices), Q is small (provided that R = 0.3) and (82) yields the 
conventional result. The significance of (83)* is that we can now 
determine the longitudinal field in the channel at pinch-off, and from 
(83) and (69) it is 
—_ - 

Wu "a (C, + Cs) 

Equation (84) applies for any set of voltage variables in pinch-off 
operation. The dependence of & on p from (64) could be substituted 
here, and since the current at the onset of pinch-off operation (i.e., 
the boundary between triode and saturation regions) is known, then 
& at the onset of pinch-off may also be expressed uniquely in terms 
of the applied voltages. It is therefore an ideal boundary condition for 
the pinch-off zone. 

It is also of some interest to determine the sign and magnitude of 
S at the pinch-off point. From the continuity of the divergence of & 
we can relate the oxide and substrate fields as 


SF, = €or Lox; (85) 


where {, may be determined, as shown by Pao,” by a detailed solution 
of Poisson’s equation in the x direction in which both mobile and fixed 
charge components are retained. An equivalent representation of %, 
is 


Ve — ¢, — Vly) — Vrs 


Box = é 


(86) 
where @¢, is the surface potential at the source. As is well known, ¢, 
takes the value of 2¢f at threshold and increases only slightly for 
additional increases in gate voltage. Therefore, using (82) in (86) we 
find that 

as + V2eqNa(2¢r + V; + Vepp/C, 

Wie Q + V2eqNa(2¢r Bs) FB/ (87a) 


€ox 
and 


Z. = Q + V2eqNa(2¢r + Vas) + Vrs/C, (87b) 


€ 


* To include short-channel effects it is necessary to replace C, with C,F. 
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Hence, we conclude that both &, and £2, start from large positive 
values at the source and decrease to the much smaller values given by 
(87a) and (87b) at the pinch-off position. These values are practically 
constant with increasing gate voltage except for the small increases in 
os above 2¢7. The fields are directed towards the gate electrode at y. 
These results are illustrated by the plots of Pao,”* which are reproduced 
in Fig. 19. We have extended the field values past the pinch-off point 





0 0.25 05 0.75 1.0 
y/y 


Fig. 19— Variation of transverse electric-field strength in the silicon at the surface as 
a function of distance along the surface from the source and through the pinch-off point 
parameters f.; = 2000A, Np = 4.6 X 10 cm™, and L = 5 pm. 
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to show that a reversal in sign of & occurs as shown by MaGowan’s™ 
numerical solution.” 


A.3.2 Formulation of the differential equations beyond the pinch-off 
point 

Using the relationships described earlier, we may now find the 
variation of electron density and electric fields in region I. The two- 
dimensional nature of the problem, which we will now discuss, applies 
only after field inversion has occurred. The section between pinch-off 
and field inversion will be considered later. 

Consider the vector field for the current flow in region I, 


J =d,x + d,y, (88) 


where x and y are unit vectors in the x and y directions. The vector 
components are the current densities in the x and y directions and 
may be written generally as 


oie D le + wn (89) 
q dy 

for the y direction and 
Js =D en + un, (90) 
q dx 


for the x direction. At this point we would like to choose a set of 
coordinate axes to most suitably represent the current flow. We know 
that along the oxide-silicon interface the flow is parallel to the inter- 
face and that along the streamline the flow is in the direction of the 
streamline. This fact suggests that we should use polar coordinates r 
and @ to represent the problem, where r is the vector extending from 
the field inversion point at the surface to some point in the pinch-off 
zone and @ is the angle between the vector r and the interface. The 
field inversion point is considered the origin or source point because 
it is the point in the channel where carriers first depart from the 
surface (from a charge sheet point of view). In the representation of 
the electron distribution by (72), then, we are taking w to be the arc 
length measured from the interface for a fixed r and a variable @. Since 
w*, g, and N are functions of r only, the assumption becomes that we 
can represent the electrons by a Gaussian distribution that extends in 
a curvilinear fashion around a circular contour. In Fig. 20 the polar 
coordinates are shown schematically in an expanded view with all of 
the r axes emanating from the field-inversion point, ¥4;. 
In polar coordinates (89) and (90) become 
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Fig. 20—Curvilinear coordinate axes in the pinch-off zone. 


J, on 

= Da + un& (91a) 
Jo _ pion 

a =D ; 30 + pn &,. (91b) 


Then using n as represented by (72) in (91) and utilizing (66), we 
obtain for the current components 


J. = quN exp(—n) | B- Be wee (92) 


and 


J, = quN exp(—n) 
N’ = (wa — a*) (w — w*)? {/1)\' 
46+ 0,/F + 2=2) 5, wort (4) |. (98) 
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where 





(w — w*)? 1 oy 
= SA ease os ’ 
" Qo” r 00 
oy Ow* dw 
SE CPT So = fe Qin 
or or or (94) 


and the prime denotes differentiation with respect to r and y in the 
two-dimensional scalar potential. From the divergence condition (74) 
we find 


1dn ni1op\10¢ on  nop\ dd 
Pees ere a Cert ee | nye 
( a0 m1 a) 28 4 (% Z x) mv Oys Pe) 


where we can consider ¢ as some pseudo-potential field defined by the 
relations 





— 7% 
-= 06/a0 = a-p(* “), 


o 


— 0¢/dr = E+ Be x + (=) Sc eo) (5) : (95b) 


Using (92) and (93) in (95a), we find the divergence relationship to be 


(w — w*) (wo -o*) _ 1Llop 
\“-e a }- a +iie 
’  (w — w*) (w — w*)? {1 \' 
+ {8+ bela pepe (%) If 


N’ w — w* _ (@ — o*)? 1\ 1 op] _ oe 
[E+( a )s. 2 (5) +23) — v%, a) 


At this point we need a representation for the electric fields & and 
Sf. Consider & first. If drift and diffusion were equal and n were 
described by (72), then one could show that & = 8.[(w — w*)/o)]. 
We will assume that even though drift and diffusion are not equal, the 
functional form of & remains the same under conditions of current 
flow and is altered only in magnitude so that we may write 


B=K, eo) (97) 





where K, is some number that is constant throughout the pinch-off 
zone. We will take the representation a step further by assuming that 
all electric fields may be represented by these functions so that we can 
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write 


Bew* Bew* 
o- 


B&=-K,— > and ¥F=—-K; 
oO 








(98) 


for the surface and along the streamline, respectively, where Ko and 
k3 are constants in the pinch-off zone, which must be determined. 
From (97) and (98) we obtain 











LAK _ Kibe , (wo) 108, 
oe ge oF r 00 (99) 
and 
OG KB. Pie a a w* OB 
or a Se — KoBew dr \o? Ke o” or’ oo 
where 
x _ dw* 
° Or 


Substituting (97) and (98) into (78c), we obtain for the surface (and 
for the region very close to the surface) 


2 
a= Se \/(4) - Ha, (101) 
where 
A (0° By" (102) 


~ Bol Ke (w — w*)? + K30"] ° 


In the same way, in the vicinity of the streamline we have 


» _ At *\2 . 
Be => + \ / (£) — BoA*, (103) 


where 
(o? &)? 
* _— o_o ere 
a Bol Ki (w — w*)? + K3w*?] ° a 

Differentiating (101), we have 

19. _ 110A BoA _ 110A 

r 00. Ar 00 Pet A\? ~A7 00° (105) 

| ? (3) Te 
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and 





ae _1 0A Bo _1ad 
or Aor Bek A\? A p Pe (106) 
9 (4) ye 


and the approximation is good throughout most of the hot-electron 
regime because A > >. In (105) and (106) the derivatives of A are 
written 


110A 2Ki(w — w*) 
5 =e 107 
Ar 06 Ki(w — w*)? + K2w*? 10%a) 
and 
100A — 2[— Ki?(w — w*)S, + K3a* Se] ; 
et ee eae ats 1 
A or Ki(w — w*)? + Kew*? d gj» (107) 
where we have used 
: od([i 
eae c (3) (107c) 
to transform the term in a’. 
Using (78c) for the mobility we now find 
1 lop 11106 
fee eas Ue See gat Mees es 1 
p roe 2Ber 00 (108) 
1 1 OB. 
Dee 31 a8 (109) 


so that the derivatives of the mobility may be evaluated using (105) 
and (106). 

Poisson’s equation may now be reduced to a simpler form using 
these results. Substituting (107c) and (107b) into (105) and (106), we 
find 








1 1 06. 2K? 1 1 06 
_— -— = —> = =0 11 
Ber 00 w*(K? + K3)’ Be r 00 no) 
and 
1 Be _ 1 Be _ = Sg pd (A 
B. or Bt or at dr \o?)" at) 
The Poisson equation (76) in cylindrical coordinates is 
OF ek a ee a eh (112) 
or r r 06 € 
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Using (110) and (111) in (99) and (100) to find 0&/dw and 0/dr, 
we can reduce (112) to an equation in (1/07) for the surface (w = 0), 
which is 


Kw* 








d {1 K.S3  Kow* | [K3 — K?\ Ky 1 
dr \o? o ore? K? + K3 oo! x2 ae 
Using symmetrical arguments for the streamline locus we have 


é (3) S? . Kaw® -K,.. -1 





K3a* a, + Kaa - += . (114) 


\ #2 
ro? a? Ne 


The parameters \,, Ae are modified Debye lengths 


Ne = = (115) 
= q(Na + Ne”) 
and 
\* = eBe 
q(Na + N) 


due to the presence of mobile charge. The traditional Debye length is 


given by 
Be 
r = = 
qQNa 


A useful result that we shall need later is found by subtraeting (113) 
from (114) to give 


2K? _Ki(c\ Ks 
Ks ( eel se 7 (3) - K, % (116) 


We now employ the condition 


p Eds = 0 (117) 


over any path in the pinch-off zone since y is a conservative field. 
Using (97), (98), and (117) we have 


< pe d= | (K3 — Kz)Be a) ene: (118) 


To simplify this result, we will assume that we may ignore the 
dependence of 8, upon w on the left-hand side. Then by integrating 
over w, differentiating with respect to r, and using (110) and (111), we 
find 
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d 1 2(K3 = Ko) 
HE ee I ee a ee 
- dr (3) Kio? — (119) 


Substituting (118) into (112) we find 
dw* w* 2(K3—K2) Ki 2K? 1 (2) 
ae = 8 (1 - 5) -Sl(-). (20 
dr r Kk, Ke ( 1 2 a) 


Substituting (118) into (113) we have 
dw* w* _ 2(Ks — Ko) K, 1 ( Oo ) 


‘dr r ky K3 Ks 


Subtracting (119) from (120) would yield (116) as before. Returning 
to (110) we use (118) to give 


1 ope _ 168, 2 2 _ Ks) - si], 
1 


(121) 


p* or B. or wt 


The results (109) and (121) are then used in (108) and (109) to obtain 
functions of K,, Kz, and K3, which are 


(122) 


; . ot = am is (w = 0, silicon interface), (123a) 
* 
Ze ae, (w — w*, streamline), (123b) 
u* r 06 
and 
1 ou* low 1 Kz — Ke * 
aa 2 ote Sz]. (123c) 


We may now return to the evaluation of (96). The right-hand side of 
(96) may be evaluated using (95b) to be 


1d B w-w*\10—B OF 
er esc a kg Oa pct 
? r 06 B+ ( oa? ji or 


N’ Ww — w* (w — w*)? OBe Bed [N’ 
-|E+( o )s.- 2 (2 :) |% 3 (Wr) 
+ £5 92 — Bw - WS. (2 :) - — B.(w — w*)S,, (+) 


ee a 





+ B. 9 9 9 


a o or? r 
_Be|N’ (wot), wot? (1) 
: ea , (3) | (124) 
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0 


We need to consider (96) along the two separate loci. Along the main 
streamline w*, we have w — w* = 0, and we will neglect the diffusion 
component N’/N compared to the drift component. Using (124) and 
(122) in (96) with 4& = &*, we then find 





{WN al 
ve x ee or 
Be +f (1 + S2) - 1 2Be | 4 Ks — Ke =e (125) 
~ >)\# r Oo “a a 


where we have considered the second derivatives of 8. and w* to be 
unimportant. The second locus is along the interface (w = 0), and in 
that case (96), (97), (98), (110), and (123) are used to give 


wr a* w* K? 1 
(-«. o +4)x(4-z es 
wo owS, w?/1)\’ 
[eae -F(a)} 
w wf/1\ 1 K3 — Ke eye 
4 a Se (5) Jo ( 7a ) s,|} = v9, (126) 


which may be rewritten, using (119), as 


K+ 0 (1 -als :))= [mS (& al 
aa 


We will now assume that the electron distribution is well localized in 
the pinch-off zone, so that o < w*. 
Then the term V*¢ may be dropped and (127) may be reduced to 


2 
(3) 
K, Ky 
Ko = ee ea ee ee 
re (are z) 


(128) 


Another relationship between the electron distribution parameters 
is found from the total current (73). Using (78a) and (72) in (73), 
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a N nn ae { "d. 

“T17 — 4VHo e "dw 

qw 2 2 pds 
+(Z) +() 


By’ (we a 
+ D 1+(—) +(— a “ 
0 (2) (2) oan Ne"dw, (129) 
where the notation ( ) represents an average over the w axis. The 
assumption is that the respective arguments are weak functions of w 


compared to the exponential term. Then the equation may be written 
in simplified form as 











ah = a Siac, I 
ra Ne"dw + 7 Ne“"dw = =a? (180) 


where the average value of & is taken to be zero. These two terms 
are the contributions of drift and diffusion, respectively, to the total 
current for a position r in the pinch-off zone. Equation (119) or (120) 
and (130) must be solved in the pinch-off zone by using boundary 
conditions that are appropriate to describe the electron distribution 
(72). Because this distribution can only make sense after a reversal of 
the x field in the silicon has taken place, then we cannot use the 
boundary conditions (83) and (84), which were established for the 
pinch-off point where the & field is directed towards the gate. Instead, 
we must establish new boundary conditions somewhere beyond the 
field reversal point, which itself is beyond the pinch-off point. It is 
also noted that (125) and (128) are equations in Kz, K3, and K, that 
must be solved together with another relation provided by these revised 
boundary conditions. 


A.3.3 Revised boundary conditions 


The pinch-off point to which the boundary conditions (82) through 
(87) apply occurs in the channel at a location closer to the source than 
to the field inversion point. In a long-channel device these points are 
fairly close; in a short-channel device the separation grows since the 
charge at the pinch-off point grows. The potential between the two 
points is Vg — Vr — Vgar. The boundary conditions at a point y; in 
the channel just after field inversion has occurred are estimated in the 
folowing way. 

From (69) we obtain 

dS R ¥#? I(1-—R) dp 


||| eee ees 1 
wis Ghee WO mie 
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In calculating (131) we are considering the derivatives of R to be 
negligible. The second term may be evaluated using (109) with r = y 
and the one-dimensional form (64) of » to obtain 

du _ By € d¥ 


dy "8B, #2 dy 





(132) 





=. 

Substituting (132) into (131), using (79) for 8., and solving for 

(d&)/(dy), we find 

__ ks 
(1—R) Bo” 


dy 

dy 

Using (133), (99), and (110) in (76), we obtain 
__ Rk f& BK, ( — (K,/K2)° 

(1-—R) Bo oj \1 + (Ki/K2)’ 





(133) 





y=y 








— : (Na+n). (134) 


In (134) we have mixed terms for two different positions. The term in 
& * actually applies only at the pinch-off position, whereas the term 
in o% applies to a point y, just beyond the field-reversal point. We will 
assume that we can use the term in & ” at y,, if we change & to 
hY&, where h is a parameter >1, which determines how much &, has 
increased in the interval y to y,. For almost all levels of current above 
threshold we can ignore the charge term on the right-hand side so that 
we obtain the result 


= xe 
fies \/ Pee Vie (EJ, (135) 


Ky = Kib 








where 


and 


_ 1-(K,/K2)? 


=e (Kay 


(136) 


The result predicts that as & increases, o, at first decreases as 
VKi /¥ for small values of & and then decreases as VK} for larger 
values of & . Using (98) to express the continuity of the longitudinal 
field, we can use (135) to find the boundary condition 


wt = Ki & U=8) 
~" KhE R * 


A further boundary condition is imposed by Gauss’ law at the oxide- 
silicon interface between y and L as 





(137) 
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€B|u=-0 = € Loxln> (138) 


where &,|y is the normal field in the oxide. The reader may show 
fairly easily that this condition may be written 


dS, Cy 
dy 


Now the left-hand side using (97), (111), and (118) may be expressed 
as K?/K2 so that we find, after considerable algebra, the relationship 


Ks) _ Coo o(1-R 
(®) - € a R )e 


&. (139) 


or 
Ky _ a 
aa ab=a’, 
where 
na C.Bo(1 = R) 
a= hoe . (140) 


If we knew Q,, we could obtain the boundary condition N, from the 
normalization relation 
qQ 
N, = =. 141 
1 Jono; ( ) 
We will determine Q,; later since it is not required at this point to 
proceed with a solution of these equations. Also, the boundary condi- 
tion on &, is, from (140), 
Ky CoBo (= (1— R) , 
Gi, = Sh? = 
: Ky ne € R 
To establish the potential V, for the new boundary condition, we 
need two contributions, A; V and A, V, determined as follows. From 
the pinch-off point to the field-reversal point we have A, V = Q/C, + 
Vr — Vrg, and from the field reversal point to y; we have Az; V. In this 
section, as a general approximation we may say 


La (J Bdy — vo+ 0) 


elox 


(142a) 





or, alternatively, if we integrate from ¥y;, the field reversal point, then 





y 
L== i Bdy. (142b) 
Ji 


élox 
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For y = y1, the new boundary condition, we have J}! “dy = AoV, 
which we can then use in (142b), along with (142a), to find 


Ag V = Bo ( 4) b. 





R 


Using Q/C, from (18c), the total voltage drop between ¥ and y, is 
therefore 


AV= V; = Vsat = AV + AoV 


7 G=R),  & 1 y\ 
= Vr — Vrp + Bo 7 b+ C, 1+ (=44) . (148) 
We then have for r; 
2 AV 
aS hel B ve 


where an average value of & between y and y, has been used. Using 
(143) with b = 1, we have 


2 1-R 
r= | Ven Ven + Bo ( 7 


C, + C, ian 
(1 sae: a 1+ ace )) (145) 


A.3.4 Approximate solutions to the equations 





Using the integrating factor —(1/r) and the boundary values w7, r1, 
we may solve (120) to obtain 


r 


w* = Cr in() Pea r, (146a) 


ry 1 


* * * * 
ge oe -om()+o+%, “= om(2)+2, (146b) 


K;-—K\ Ki 2K? 1 («\ 
=2( —) -+(1-——5)-s(-]) «a 
¢ ( K, K ( K+ a] Ky; () ey 
and is considered to be constant. Used in (119), this result can be 
written, by using (140) and ignoring the term in a/),., as 


1dv_ = (C+a) 1 


v dw* SS). ot" 
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where v = (1/07), which then yields 





* a 
o? = g? (=) (147) 
Wy 
where 
a a (147b) 


We now use (115); assuming that K; < Kz and that Ko = K3 we can 
write (115) as 


Ks — Ke = > — qN. (148) 


This condition on K, amounts to saying that the field that is pushing 
carriers away from the surface is much smaller than the field that is 
driving the current; this is always found to be the case in the on region. 

To find N, we return to (130) and assume that at the position 4; the 
dominant conduction component is drift. That is, at the saturation 
point the diffusion current accounts for about one half to one third of 
the current flow but at a distance, r,, further down the channel, drift 
has again become dominant. If drift has become dominant, then we 
can say from (130) 





I 
= ——__, 149 
and as velocity saturation becomes predominant, this becomes 
Q= (149b) 
Wo, 


From (149b) we can see that the gradient of the charge along the 
streamline actually goes to zero in the limit of velocity saturation. 
Therefore, the assumption of total drift is reasonable. With this 
assumption (149b) gives us a boundary condition Q, at yi, and there- 
fore, N,, which would be 


I I 
N, = = Wat in = > ——— , 
a ee ae ae | A 


from the normalization of the charge, which approaches the velocity- 
saturated condition shown. From (149b) we can write the equation for 


N for any position as 
I 
= —-—. 150 
Wu; V2060 120) 
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Using (150) and (108) in (148) and then the result in (146c), we find 
Ke ¢ BoKsQo¥? wi” 2-0 
2 ew tw ?. 1 
Kie 42x 3 rm a 
Since C is independent of position, we have from (151) 

a= 4/8. (152) 
Substituting from (135), (137), and (140) we can reduce (151) to 


Z SH pl NIA USEY” BGC) 
C=Cta= am ay es oe. (158) 


In writing (148) we assumed that K3 — Kz « K3, Kz, and we have used 
this fact again here. It is noted from (147) that (152) implies the result 


C=2 


S. = - (C+ a). (154) 


We now use (150) in (125). From (150) or (149) we have 





, * * 
ee 2, (155) 


Es dr u* dr odr 
and using this result and (107c) in (125), we obtain 


1 dy oedfl 1 1+ S2 
Fe te og aE i w 
#3| @ dr | 2dr (3)| Be s+ o | 
where we have dropped all terms in (c/w*)? (such as the term in 6/w*) 
and o”/rw* because we are assuming that the Gaussian is localized 


well enough that (c/w*)? « 1. Using (100) for the streamline and 
(114), we obtain 


S, 8 d/l 1.1 4:83 
Ky 5 +3 Kot 2 (4) =-( + 52 ) 











o dr ea 
and then substituting from (119) and (146c) 
1+ S82 
aah Se ee 1 
= B3(C +a) og’ 198) 
a oa as So 


where the term (1/K2)(o/A-)” has been considered small. 
We are going to assume that in the range of high fields in which we 
are interested, 


C> 1, ky, 
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and S., = 0, which means that (dw*/dr) = (dw/dr) [cf. (94)], so that the 
streamline locus is effectively a straight line. Then (128) and (156) 
may be subtracted, and (146c) may be used to give 


1 














3 
S*=-(C +a) (157) 
: (1 + K;) (° = *) 
2 
fice (C + a)’ 
Cod | (158) 
and 
_ 2 (C + a)? Ctra ; 
Ky= ge [i+ ee |+( 9 ) &. (159) 


Using (138) and (153) we then find 


sae 2/5 . 2/5. /4\3/5 
Kk, = (24) , K2 = (25) (2) (160, 161) 
2/5 3/5 4/5 1/5 
Sey phy ge ee 7k 
m= (3) () tale) () om 


2/5 1 3/5 
Cta= (2) out (2) : (163) 


a 


These are the constants that characterize the electric fields. They are 
determined by the parameter a, or, equivalently, &, through (140). 
These results are plotted in Fig. 21 over the total range of & for a 
practical device. All of the assumptions we have made are validated 
by the plots. We now use these results to determine the length of the 
pinch-off zone L — y. From (117) we have 

w? = 2(K3 — Ke) (Vos — Vi) 


(164) 


oO aa Ki Ks Be 
Using (103) for 6. along the streamline and substituting (147) for o, 
we obtain a result for w*, which is 


3/4 
(Vos — FN) . es) 


* 
* — Bred Oe C + 3/4 
ogee at F 


Now w* = r*@ and L — y; = r*cos 0, so that we may write the length 
of the pinch-off zone as 


cos 6 wa; 
6 of? 


(Vos — Vi)Bo 


3/4 
(C + a)*4 e | F (166) 


L-y,= 
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C, Ky, Ky 





E(V/cm) 





0.1 0.01 0.001 
a 


Fig. 21—Variation of the field constants with pinch-off field <4 or its normalized 
form, a. The device parameters are t,, = 250A and N, = 5 x 10* cm 


where @ is the angle formed by the interface and the vector from the 
point y, to the streamline axis. Using (135), (137), (140), and (163) for 
01, #1, and (C + a), respectively (under the condition b =~ 1), we have 


cos 6 € R \" 
L-y;= , ott gn ( /4(Vps — V,)?4. (167) 





We now require an estimate of 9. In solving for the constants Kj, 
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Kz, and Ks, we have assumed that S,, =~ 0 and that S3 was constant, 
which implies that @ can be considered constant since w = y#, and so 


9=C (in +), (168) 
V1 

where the In term must be written as some average. We cannot actually 
estimate @ because the solution does not take into account the effect 
of the finite depth of the drain junction. We will, therefore, use a value 
of 6 = 45 degrees to expedite calculation of (167). This is a reasonable 
assumption based on the examination of two-dimensional numerical 
solutions of this region. (Note that the function cos 6/6 is not a strong 
function anyway, varying between 0.5 and 0.85 for 6 between 30 and 
45 degrees). 

The result (167) applies for voltages Vps > V, but not to the situation 
when V, > Vps > Vsar. To extend the result over the interval AV = 
Vi — Vsar and thus to extend L — y, to L — ¥y, we will write (167) 
approximately as 


7 his 
ALy =[- y= exe 
-Ail(Vps — Vsar + A2AV)*4 — (A, AV)*“], (169) 
where 





4/5 1/2 _ p\3/4 
Ai = cos 0 1 gi/2 Co 1-k Bae (170) 
6 \y € R 


and h is determined by the condition that L — y = r, for Vps = 
Vsar + AV and Az is an additional constant that we must find. Using 
(145) in (169) and for the saturation condition, noting that the current 
in velocity saturation may be written I = WuQ. €?/(1 — R)&, we 
have 











9 1/4 @rr 1 1/5 
(2) At. + A) — AY]. (71) 


(h+D\C,) we Be ge? 


To determine A, and h, we require another relation, which is given 
by the condition of the continuity of the derivative of the current at 
the boundary between the triode and the saturation regions. This 
condition is found to be [cf. (48)] 





ee dy [ I 
Q/C, = re (4a). (172) 


Since Q ~ I/Wu, for higher channel fields, (172) becomes 
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_ Bay 


1= qv: 





(173) 


From (169) we have 


dy h® 3 1 + dVr/dVps 
dVpg 9/2" 4 (Vos — Vear + Ag AV)!4 


8h Ay 3/4 3/4 
— 9 eR [((Vps — Vsar + A2AV)* — (A2AV)*]. (174) 


For Vps = Vsarr, (174) reduces to 


— 1/5 





where we have considered (dV;/dVps) « 1. Using (175) in (1738) gives 
the condition 

3 hVA, 

4 AV AVIA BR 

In writing (174) we have assumed that the dependence of h’” on 

& and hence Vps can be ignored. Noting that for 7 = y we have L — 
y=L—y, =r and Vps — Vsar = AV; then using (144) in (169) we 
have 


= 1. (176) 


2AV h®A, AV*4 


h+ne 9 en) 
Using this result in (176) yields 
3 #\ 
A, = car 4) 2 (178) 


Since h > 1 and & = & in the range of interest here, Az will be a 
small number (i.e., <0.1, typically). We are, therefore, justified in 
treating the square brackets as 1 in (171) and so we find for h 


LO A Ad 

1/5 _ <0 pe 
(h+ 1h? = £2 (2) GPA," (179) 
If we considered a typical value of & = 10° V/cm (i.e., well into the 
hot-electron regime) and use A; = 175 [cf. (170)], then we find h = 
2.8. For higher values of & , h would increase, although we would not 
expect values much higher than & = 10° V/cm in practice. Therefore, 
we will treat h generally as a constant with a value 2.5 through 3.0, 
especially since it appears in L — y, the result of major interest, only 

as hl, 


MOSFET 1391 


A.4 Discussion of results 


The result (169) is valid for high values of pinch-off field & so that 
the hot-electron approximations [cf. (105) and (106)] are valid. For 
low values of &, L — y becomes anomalously large according to (169) 
and is obviously incorrect. However, it has been shown that for gate 
voltages near and below V7, a good representation of AL is 


AL, =L-y= \ / 2 ay (Vos — Veer). (180a) 


For gate voltages in the ‘on’ region (180) becomes notoriously gross. 
If we now combine the results (180) and (169) in the form 
1 1 if 
— = — + —,, 
AL ALy AL, 
we obtain a representation of yy = L — AL, which is good for all regions 
of operation. 

We demonstrate this agreement by the curve shown in Fig. 22. The 
curves show data and theory for a 0.5-um device. The lower solid 
theory line shows the agreement obtained using (179) to determine ¥, 
which is then used in the velocity-saturated model of the companion 
paper. The agreement is quite good. The upper solid line shows the 
result that is obtained if the AL, [cf. (180a)] is used alone to predict ¥ 
and there is considerable error. However, there is not as much error 
as one might expect on the basis of (L/L — AL ,)Isar, which is normally 
considered to be the case. where [sar is the current at the onset of 
saturation. This is clear from the comparison of AL, AL,, ALy, and 
AL’ shown in Fig. 23. The variation of AL in Fig. 23 is much more 
than is manifested in Fig. 22 because the velocity-saturated current 


[cf. (20b)] 


es wo.c| (3) os = Vr) = (Qo/Co)? _ 3] 


(180b) 





a* a* at 

loses its dependence upon y. In the limit of total velocity saturation 
there is no dependence at all. Therefore, we are completely justified 
in making the approximation such as (168). In fact, the coefficient A, 
could be considerably in error without having much effect. The same 
may be said about the radius of curvature of the junction. The fact 
that we treated a vertical instead of a cylindrical junction and did not 
consider explicitly the junction depth is really of very little conse- 
quence. The most important feature of the result is the inverse 
dependence of ALy upon ¥ *”. It is this dependence that allows AL 
to track the device geometries and applied voltages in a continuous 
fashion. 
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Vg = 4.65V 


IpgimA) 


Vag = 2.65V 


eg conceaan y neers? 


Ves = 1.65V 





1 2 3 4 
Vos (V) 

Fig. 22—Comparison of drain voltage data (dashed curve) of a velocity saturated - 
device with theory using the composite result (128) (lower solid curve) and the simple 
depletion result (127) (upper solid curve). The effect of series resistance (~509) has 
been removed from the data and the device parameters are L = 0.5 um, tor = 250A 


ee 10 pm, Na = 5 X 10° em", 7; = 0.25 wm, po = 650 cm2/V-s, 6 = 0.03, and v, = 10° 
cm/s. 


APPENDIX B 
B.1 Introduction 


In a previous work, we had examined the problem of ensuring the 
continuity of current in moving from the subthreshold to the above- 
threshold regions by introducing a voltage parameter 6Vg into the 
expression for the saturated current so that it became 
WuC, 

2y 





Isr = 


(Ve -— Vr + 6Ve)’, (181) 
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Veg = 1.65V 


AL, 


AL, 


Veg = 3.65V 


AL (cm) x 1078 
N 


A 
Voc = 1.65V ° 
GS — 
-— 


Veg = 3.65V 
—_ 


re 





Vos (V) 


Fig. 23—Variations of ALy, AL; (solid curve) and AL (dashed curve) for the device 
in Fig. 11. Also indicated is the effective AL’ if the current in saturation is I = IsarL 
L— AL, where Isarz is the pinch-off value. AL’ is less than AL because of the effects of 
velocity saturation. 


where Jay refers to the above-threshold current.’® The parameter 5Vg 
was introduced to account for the absence of diffusion in the above- 
threshold formulation. It was determined on the basis of a unique 
value of current at the threshold condition from consideration of the 
above-threshold and subthreshold currents. However, to be consistent 
in this approach and to have a useful result, we must also require 
continuity of the derivative of the current between the subthreshold 
and above-threshold regions of operation. 

In this appendix we will re-examine the conditions of the continuity 
of current and its derivative at two important transitions in the device, 
namely, (1) the transition between the subthreshold and the above- 
threshold saturation regions, and (2) the transition between the triode 
region and the saturation region, both of which are above threshold. 
These conditions are needed to determine R, the fraction of the total 
current carried by diffusion at the pinch-off point above threshold. 
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It is noted that the former approach” of substituting the parameter 
6 Vg into the above-threshold result for all gate voltages is not complete. 
When determined self-consistently, the parameter 6Vg gives only the 
voltage for which the currents below and above threshold are equal 
but says nothing about their derivatives. 

The transition from the nonsaturated region below threshold to the 
triode region above threshold will be considered only after the results 
have been obtained since it only exists over a drain voltage of about 2 
kT/q and hence is relatively unimportant. Also, the transition from 
linear to triode regions below threshold does not need to be considered 
since its continuity has already been established.” 


B.2 Transition from subthreshold to above-threshold saturation conduction 


Since we are concerned at this transition with currents flowing just 
above the threshold voltage (i.e., almost at the threshold point), then 
we are justified in using the above-threshold current expressions that 
do not include velocity saturation because the channel fields for such 
voltages are less than %. From (83) we have 


Q_ (1 —R\kT(C, + CsF) 
CG \ R q CG; ; 


A eqNa 
Cs V 2(V + Vas + 2¢r) te2b) 


is the semiconductor depletion capacitance at the pinch-off point in 
the channel. At the threshold condition we will have Cs = Cs, i.e., the 
same value at the source and the pinch-off point. For a constant T, Q 
is dependent on Veg only through Cs, on Vps only through F(Vps), 
and on Vgzs through both of these. (It may also have some dependence 
upon Vgs and Vps through the parameter R.) We will assume at this 
point that the dependence of Q upon bias parameters is sufficiently 
weak that we may ignore its derivatives in the calculation of the 
derivatives of the current above threshold. We shall re-examine the 
validity of the assumption later. The result (82) suggests that we 
should use it in the triode region equation 


(182a) 


where 








C,W Vi 
Ips = * 7 Vos — Vr(Vps)) Vos — oa (183) 
to obtain a modified form of the saturation current, which is 
C,W AY 
Isar = ee (Ves — Vr(Vps))? — a ‘ (184) 
2y C, 
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The threshold voltage has been written as a function of drain voltage 
to show that the 3/2 power terms of the conventional triode expression 
may be represented.!® We conclude from (184), therefore, that true 
square law behavior of the device may only be observed for Vgs — 
Vr( Vos) >> Q/C.. 

In considering the saturation current at the transition between the 
two regions, we are going to assume that an offset gate voltage of n 
above the threshold voltage must be applied to the device in order for 
the above-threshold theory to predict a finite current flowing at the 
transition point. This situation is illustrated in Fig. 24, which shows 
data for a reasonably short-channel device (L = 2.5 wm, t = 500A, and 
Na = 10'* cm“) in the region near the threshold voltage. The voltage 
n is shown as the voltage increment from V, to the onset of the 
straight section. An equivalent way of stating this condition is that 
the above-threshold current merges with the subthreshold current not 


60 





L=2.5 ym 
t= 500A 
N, = 10'6/em3 
50 


40 


20 


Veg (V) 


Fig. 24(a)—Variation of drain current with gate voltage showing the determination 
of 7 and its relation to V; for linear region characteristic. 
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for Ves = Vr but for a slightly larger value of Veg = Vr + 7, so that 


(184) becomes 
WC, ? 
Isat | v,+9 = c c a (2) i (185) 





2y Co 


In writing the saturation current we have used y in place of L in (184) 
and (185), where AL = L — y is the channel-length modulation in 
saturation operation. The relative size of AL to L can be significant 
in a short-channel device. 

At the gate voltage V7, the subthreshold theory predicts the flow of 
a finite diffusion current, which would correspond to an effective 


Ips (wA) 


6.25 





Ves(V) 


Fig. 24(b)—Variation of drain current with gate voltage showing the determination 
of 7 and its relation to Vr for saturation region characteristic. 
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surface potential ds = 2p. Since the transition between the regions 
takes place for Ve = Vr + 7, in a similar way we should introduce a 
corresponding offset surface potential into the subthreshold formula- 
tion. It may be defined as the additional surface potential above the 
threshold value (¢s = 2¢;) that is achieved when the additional volt- 
age 7 is applied. The subthreshold current may be written?” 


_ gD \ /___: RI) oF gat 
tn Vana too (g) Mee 88 


where ¢s is the effective surface potential in subthreshold operation. 
Therefore, when a merging of the subthreshold and above-threshold 
currents is considered, the transition will occur not for ¢s5 = 2¢r but 
rather for a slightly larger surface potential of 2¢7 + & The voltage 
variation of the current is predominantly in the exponential term 
through the effective surface potential ¢s, which is obtained from the 
equation 


a 1 
Vos = Vera + bs + C V2eqNa(Vas + ¢s)-F. (187) 
Therefore, we will neglect the voltage dependence of ¢s in the pre- 


exponential term. The factor F has been defined previously for short- 
channel devices.’® At the threshold condition it.takes the specific form 


R= 
L+r- ; VKVps — ; [KVps + 2r;VK(Vbi + Vas + Vos) + rey 
~ 5 [2rjVK (Ws + Vas) + 731” 


L~ VRVi, , (188) 
where 
K = 2¢/qNa, 
r; = the junction depth, 
and 


Na = the doping concentration 


and it is assuming that V,; = 2¢r. 

The transition from subthreshold to above-threshold conduction as 
a function of drain voltage is obvious in a short-channel device because 
the variation with drain-source voltage of the effective ¢s (i.e., the 
average gs in the channel determined from two-dimensional charge- 
sharing techniques) below threshold and the Vr above threshold are 
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quite pronounced. Actually, this transition always exists even in a 
long-channel device since we may never have absolutely no channel 
length modulation except in the limit of a device of infinite length. 
Hence, there will always be some curve of J versus Vps, which passes 
from the subthreshold to the above-threshold region at some Vpg, so 
that we may consider this condition in a general way. By imposing the 
condition of continuity of current, we obtain from (185) and (186) the 


relation 
if, /{@ *)_ (kT\' Cs & 
ak (2)|-(4) ae 


where C's, as mentioned in the definition (182b), is the capacitance of 
the semiconductor depletion region. This capacitance will be approx- 
imately the same at any position between the source and the edge of 
the drain-depletion region (i.e., the saturation point in the channel) 
for gate voltages up to the threshold condition, since we are assuming 
a negligible field in the channel for the subthreshold formulation. The 
term ~ represents the additional surface potential that is required 
above threshold to achieve a matching of the solutions. We expect this 
increase in surface potential introduced in (186) to correspond to the 
concomitant increase in gate voltage 7. The solutions are joined, 
therefore, not at Ves = Vr but at a slightly higher voltage, which is 
determined from (187) to be 


Vr+n = Vrp + 2gr + & 
+G — = Bega Vas +2or +6) F+ aa Cs , (190) 


where the term [(RT')/q]/(C,/C.) is an attempt to represent the change 
in mobile charge in the channel between the two conditions. We have 
simply used Q, = N4Xa, the charge in the channel under diffusion- 
limited conditions, where Xz is the effective depth of channel charge.” 
We must now impose the condition of continuity of the derivatives at 
the transition. From (184) we have (neglecting the derivatives of Q) 


dl I dy W dVr 














== = — wl, — , 191 
aVen 9 OVen, Oe eVns oe) 

and from (186) we have 
di tI dy q dos (199) 


dVps_ 5 dVps . kT dVps" 
We therefore have at the transition from (191) and (192) that 
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dVr __(kT\Cs # dds 
n iV ( ; ) C. e Vos’ (193) 
From (187) we obtain for the derivative of the threshold voltage 
dVr 1 : dF 
= N es 
dVos C, 2€q A(Vps + 2or + £) dV (194) 
or, for the surface potential at the transition 
dds 
dVps bg=2bptt 
1 dF 
= — ———__ + &) ——. 
(C, + CsF) 2eqNa(Vps + 2¢r + &) Vos (195) 
Using (194) and (195) in (193) we obtain 
kT C ris 
S (196) 


a MS Sor 
. q C, + CsF eo 
Using (190) and (187) at Ve = Vr, we find 


n=E+% VON, 


-(VVpsg + 26r + & — VVus + 2br)F + ee (197) 

and by expanding the square root term, since << Vs + 2¢r, we have 
(C, + CsF) kT C, 

| ee eed te ee 


C E qC (198) 
Combining (196) and (198) we obtain 
ee eS 
q (C, = CsF)? qd (Cy 2 C.F) 
or 
pp OsCo ge Gs 
* -4+GsFP°  G+GF’ aon) 


where ¢’ = g&/kT. We may, therefore, determine £ once values of oxide 
thickness, doping density, and substrate bias have been specified by 
using this transcendental relationship. An interesting feature of this 
parameter is its lack of dependence upon T’.. 

By this method we have ensured continuity of the current and of its 
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derivative in passing from the subthreshold to the above-threshold 
region. It is realized that the absolute value of the predicted current 
will be somewhat low around Vg = Vr because the subthreshold region 
does not incorporate drift and because the above-threshold result does 
not incorporate diffusion. However, this is not expected to be of any 
serious consequence. To calculate the current for voltages between 
Vr + n and V7, the surface potential is found from (198) and then 
(186) is used. 

The results of calculating £ and y are shown in Figs. 25 and 26 as a 
function of oxide thickness and substrate doping over a wide range of 
these parameters. These parameters are independent of gate bias but 
have a slight dependence on Vps through the parameter F’ In Figs. 25 
and 26 a value of F = 0.8 was used, and to illustrate the influence of 
the drain bias, the curves are also shown in dashed lines for F = 0.6. 
Generally, the results show that in the range of useful device operation 
(say Ng ~ 1-2 X 10" cm™ for t,, = 500A or Ng = 3-5 X 10° cm™ for 
tox = 250A), the parameters 7 and £ are slowly decreasing functions of 
both doping and oxide thickness and may be readily determined from 
these simple calculations. For rather thick oxides and high doping 
levels, n and é& tend to rise again, which shows that the subthreshold 
region is penetrating farther into the above-threshold region. One can 


£(V) 





Na (cm—3) x 1015 


Fig. 25—Variation of £, the surface potential increment above threshold, as a function 
of subetrate doping for several oxide thicknesses. The dashed curves show the effect of 
drain voltage through the short-channel factor F. 


MOSFET 1401 





0 4 8 12 16 20 24 28 32 36 40 
Ng (em—3) x 10°5 
Fig. 26—Variation of n, the gate voltage above Vr for a merging of regions, as a 


function of substrate doping and oxide thickness. The variation with the factor F is 
shown by dashed lines. 


think of 7» and é as offset voltage parameters that describe the protru- 
sion of the subthreshold region into the above-threshold region. 
We may now solve for Q/C, using (196) and (189) and obtain 


@ _ kT (C+ CsF) 


&’V1 — 2/t’. (200) 
CG 8g C, 
We may then use this result, together with (182a), to determine 
R= : (201) 


1+ é’v1 — 2/t"— 


From (200) and (199) we conclude that Q/C, is approximately inde- 
pendent of bias parameters. The approximation is equivalent to ig- 
noring the dependence of C, upn V. However, this is the same approx- 
imation we made to arrive at (191), so it is a consistent one. Within 
the same approximation, we see from (201) that R is also weakly 
dependent upon bias parameters, another fact we have used to arrive 
at (191). We show later in the discussion how good the approximation 
is. 
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Fig. 27—Variation of R, the fraction of diffusion at the pinch-off point as a function 
of Ny and t,, for two values of F (and therefore Vps). 
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Fig. 28—Variation of Q, the channel charge at the pinch-off point as a function of 
Na and ¢,, for two values of F. 
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The values of R and @ are plotted in Figs. 27 and 28 for substrate 
biases of Vgg = 0, —3V as a function of substrate doping and oxide 
thickness. As for the other parameters, the effect of Vps through the 
parameter F is not that great. Both R and Q are moderate functions 
of doping for a given oxide thickness. The value of R ranges from 0.2 
to 0.6, but typically for t.. = 500A, N4 = 10", or tx = 250A, Na =5 X 
10'*, we find R = 0.4, i.e., about one half of the current is drift and 
one half of the current is diffusion at the pinch-off point. As in the 
case of n and & for higher dopings and thick oxides, @ increases 
considerably and R drops off. Therefore, as the subthreshold transition 
point pushes higher above the threshold voltage, the pinch-off charge 
increases, and the fraction of the total current carried by drift tends 
to increase. 

It should be noted that the parameter R has been derived from 
conditions near threshold in which region velocity saturation of car- 
riers can be ignored. The form of Q [cf. (182b)] applies with or without 
velocity saturation; our interpretation of Q is that the effects of velocity 
saturation enter only through J, which becomes a hot-electron tem- 
perature under velocity-saturated conditions. From (182b) this is 
equivalent to R being independent of velocity saturation, and this is 
supported by (201) since £’ is temperature-independent [cf. (199)]. It 
is, therefore, reasonable for R to be used for all gate voltages even 
though it was derived from conditions at the transition. 
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LETTERS 


Comments on “Off-Line Quality Control in Integrated Circuit Fab- 
rication Using Experimental Design,” by M. S. Phadke, R. N. Kackar, 
D. V. Speeney, and M. J. Grieco* 


I have recently read with extreme interest the article “Off-Line 
Quality Control in Integrated Circuit Fabrication Using Experimental 
Design,” by M. S. Phadke et al. (hereafter denoted as Phadke). 

The authors acknowledge the contributions of Professor Genichi 
Taguchi in this report. I believe my understanding of the Taguchi 
method is sufficient to raise the issues below. It is believed that the 
issues are sufficient so as to challenge the usefulness of the Taguchi 
Method if Phadke is used as a primer. If this should happen it is 
believed that a new and useful tool might be lost to the engineer. 

The introduction to Phadke gives a concise presentation of the 
Taguchi Method. The problem arises when Phadke has to establish a 
signal-to-noise ratio (s/n) for analysis and selection of the optimal 
process by maximizing s/n. One must first understand the meaning of 
the 5 (experiments 5, 15, and 18) or 10 data points in Table III of 
Phadke. It is abundantly clear from the table headings (and the text) 
that these multiple measurements at each experimental condition fit 
into two categories: 

1. They are the result of multiple measurements at different spots 
on a given wafer, and 

2. They are the result of measurements on a second wafer processed 
at the same time as the first. 

Phadke processes these 5 or 10 measurements to establish the mean 
pre-etch line width and the respective standard deviation listed in 
Table IV. It is at this point Phadke breaches the teachings of Taguchi. 
The standard deviation listed in Table IV is not the variance (square 
of the standard deviation) of which Taguchi speaks. The standard 
deviation values in Table IV are in fact due to at least four possible 
factors: 

1. Experimental measurement error (treated only cursorily on page 
1281). 

2. Variations from wafer to wafer (undoubtedly small for adjacent 
wafers cut from a large boule). No information is given about the 
source of the wafers, except that they were scarce. 

3. Variations at the five locations due to the effects of the mask 
variations, or other asymmetries in the wafer processing (e.g., the 
means of all 33 measurements of the pre-etch line width data—Table 


*B.S.T.J., 62, No. 5 (May-June 1983), pp. 1273-309. 
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I]J—show that the “bottom” line width is distinctly different from the 
other four, i.e., its average value is 2.81 um versus 2.62 to 2.69 for the 
other four; in fact, one is about 90-percent confident that the “bottom” 
data are different from the “left” data). 

4, Variations in location due to the experimental process (i.e., 
experiment 2 versus experiment 9, etc.). 

It is extremely clear that the signal-to-noise (7 = log x/s) data in 
Table IV (page 1286) are not even remotely related to s/n that Taguchi 
teaches since the standard deviation is composed primarily of the 
effects of repeat measurements for one experiment. That is, the s/n 
in Table IV is simply another response, just as line width (x) is a 
response. On the other hand, Taguchi’s noise (variance) is due to 
repeat experiments at the same or nearly the same experimental 
conditions (see the Wheatstone Bridge example in Ref. 3 of Phadke). 

It is in the reduction to practicality where Phadke seemingly slips. 
The average s/n (yn) in Table V is simply the appropriate average of 
the s/n = log x/s in Table IV. For example, the average s/n for factor 
A level 1 is (1.4803 + 1.3512 + .-- 1.2709)/9 = 1.28568 with a standard 
deviation of 0.13749. If one were interested in calculating the s/n ratio 
for Table V using the s/n of Table IV as a response variable, then the 
values would be those shown below for two of the 24 values that can 
be calculated (more will be said about the log transformation later). 


Table V—Pre-etch line width for average s/n 


Average s/n 


Factor Level 1 Level 2 Level 3 



















A Mask Dimension 

BD Viscosity Bake Temperature 4 (B2D,) 1.3838 (B,D2) 1.4442 
B Viscosity 1.4098 1.3838 

D_ Bake Temperature 1.3796 1.4442 

C_ Spin Speed 1.3663 1.3503 1.4868 
E_ Bake Time 1.4328 1.4625 1.3082 
F Aperture 1.5368 1.4011 1.2654 
G_ Exposure Time 1.3737 1.3461 1.4836 
H_ Developing Time 1.3881 1.4042 1.4111 









Overall average s/n = 1.4011. 


THIS VALUE SHOULD BE 


1.2857 _ |. 51657 _ 
0.1375 9:°3%!o O.237¥7 
BEFORE ANY TRANSFORMATION 6.3864 
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In the case of the correct data in Table V (as per Jugle) the noise is 
due to fluctuations in the other eight process variables. This is the 
noise of which Taguchi speaks. However, the object (first paragraph 
of Section I in Phadke) of this experiment was to determine a process 
for which the line width was under control and its variance was 
minimal (i.e., the line width s/n ratio was a maximum). If this is the 
case, the data in Table V would look like Table Va below (note that 
only three values have been calculated and no transformation has 
been applied to the </s values). 

One can now follow the Taguchi Method and ANOVA on the data 
shown in Table VII and Table Va to establish a process condition 
under which the appropriate process control variable can shift the 
mean pre-etch line width with little, or no, change in the variance and 
select the other process variable settings so as to minimize the variance 
(maximize the s/n). If one uses the data in Table V (as per Jugle) and 
the data in Table IV, one can determine a process variable, which 
controls the mean s/n across the wafer, and a set of process variables, 
which minimizes variance in this s/n value. The two conditions may, 
or may not, be the same, so the appropriate trade-off would have to 
be made. 

There are two other comments which must be made. First, the log 
transformation of x/s is of concern and it should be used only discrim- 
inately. Taguchi (as well as Phadke et al.) are products of the com- 
munications industry where x is generally very large (10° through 10") 
and s is comparatively small (10° through 10%). In this case the log 
transformation not only makes sense, but it is mandatory. However, 
in the Phadke experiment, and in many other experiments, 0.02x < s 
< 0.20z. In this case the compression of larger values by the log 
transformation may deemphasize larger effects and might have con- 
tributed to the second comment below. Second, on page 1303 Phadke 
talks about “implementation and the benefits of optimum levels.” It 


Table Va—Pre-etch line width s/n 


Factor Level 1 Level 2 Level 3 
A 5.448 6.804 — 
BD 8.587 e e 
e e e e 
e e e e 
e e e e 

2. + 2. oe Q, 
Hes a OE OB Oe a 


9 


ee a 
Tay (a a 9)? = 0.4565 


s/n = 2.487/.4565 = 5.448 


2 = 





s 
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is somewhat disturbing that after optimization by their analysis they 
had to change the control variable (Taguchi’s signal factor) called 
exposure setting from a nominal position (90) to beyond the experi- 
mental range (140 versus 108, assuming a fixed aperture of 2). This 
would suggest that there exists either a major uncontrolled variable in 
the experiment or a strong unrecognized interaction controlling the 
results, and that the positive outcome (in terms of implementation 
and benefits) is largely fortuitous. 

Finally, in the calculation for Table Va the mean value for the pre- 
etch width calculated by this author does not agree with the value 
published in Phadke in Table VII (i.e., 2.487 versus 2.39). The data in 
Table IV are self-consistent (x, s, 7); hence one would expect that the 
x values are correct. This may be a transcription error. However, there 
are enough small errors in the calculated data I have checked (i.e., 
only about 5 percent of the calculations) to suggest some further 
review is necessary. 

I would like to thank T. Barker and L. Smith (both of Xerox 
Corporation) for helpful discussions. 

Don B. Jugle 
Project Manager 
Xerox Corporation 


Reply to Letter by D. B. Jugle 


Designing a process or a product that is robust against all noises is 
the key objective of off-line quality control. However, not only do we 
usually not know all noise factors, but even if we knew them it is 
inefficient and unnecessary to include all of them in an off-line quality 
control experiment. It is generally adequate to consider a few impor- 
tant noises in the experiment, with an anticipation that a design 
robust against the chosen noises will be robust against all noises. The 
confirmation experiment, then, verifies the robustness of the optimum 
design. 

Signal-to-noise ratio (s/n) is a measure of process variation, which 
in turn is a measure of robustness. Depending on the problem, a 
different method is used to compute s/n. In the article by M. S. Phadke 
et al.* the s/n was calculated from the variation of the line width 
between wafers and within wafers. This s/n takes into consideration 


*M. S. Phadke, R. N. Kackar, D. V. Speeney, and M. J. Grieco, “Off-Line Quality 
Control in Integrated Circuit Fabrication Using Experimental Design,” B.S.T.J., 62, 
No. 5 (May-June 1983), pp. 1273-309. 
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the line width variation resulting from material variation between 
wafers, the nonflatness of a wafer, nonuniformity of projection print- 
ing across a wafer and from one exposure to another, etc. But, this 
s/n does not include the effect of normal variation in the process 
factors around their nominal values. Another way to compute s/n is 
to consider the data from all experiments at a particular level of a 
factor. For example, to calculate the s/n for level 1 of factor A we 
would use all observations on pre-etch line width corresponding to 
experiments 1 through 9. This s/n would then include the effect of 
variation in the other eight process parameters. However, the variation 
in the process parameters among experiments 1 through 9 is too wide 
compared to their normal variation. So, this s/n is also not perfect. 
The s/n used in our paper was selected on the basis of engineering 
judgment. 

In his letter to the editor, Mr. Jugle has suggested in Table V (as 
per Jugle) that the s/n for various factor levels should be computed 
from the s/n for individual experiments. This should never be done 
because we are not interested in the variation of s/n. Also, the s/n 
suggested in Table Va (as per Jugle) is not appropriate because it 
ignores the linewidth variation within wafers and between wafers. 

The main reason for taking the log of x/s is that the factorial effects 
have better additivity in the log domain. For example, without the log 
transformation it is possible for the prediction of x/s to be negative. 
This unrealistic prediction is avoided by the log transformation. 

After publication of the article,* the authors realized that there were 
some typographical errors in the data. However, these errors have 
little impact on the final results. The presence of these errors is 
regretted. 

The confirmation experiment reported in the article referenced 
showed without doubt that there was a four-fold reduction in the 
process variance and a three-fold reduction in the cases of unopened 
window. Also note that the off-line quality control method has been 
applied successfully in improving numerous processes in AT&T and 
elsewhere. 


M. S. Phadke 

R. N. Kackar 

D. V. Speeney 

M. J. Grieco 

AT&T Bell Laboratories 


G. Taguchi 
Consultant 


LETTERS 1409 


PAPERS BY AT&T BELL LABORATORIES AUTHORS 


COMPUTING/MATHEMATICS 


Bentley J. L., A Case-Study in Applied Algorithm Design. Computer 17(2):75- 
88, 1984. 

Cargill T. A., The Blit Debugger. J Syst Soft 3(4):277-284, 1983. 

Christen C., Hwang F. K., Detection of a Defective Coin With Partial Weight 
Information. Am Math Mo 91(3):173-179, 1984. 


Chung F. R. K., Graham R. L., Edge-Colored Complete Graphs With Precisely 
Colored Subgraphs. Combinatori 3(8—4):315-324, 1983. 

Fishburn P. C., Probabilities of Dominant Candidates Based on First-Place 
Votes. Discr App M 7(2):131-140, 1984. 

Gale W. A., Koenker R., Pricing Interactive Computer Services. Computer 
J 27(1):8-17, 1984. 

Graham R. L., Yao F. F., Finding the Convex Hull of a Simple Polygon. J 
Algorithm 4(4):324-331, 1983. 


Hanson S. J., Kraut R. E., Farber J. M., Interface Design and Multivariate Analysis 

of UNIX Command Use. ACM T Offic 2(1):42-57, 1984. 

ee J. A., Kleiner B., A Mosaic Of Television Ratings. Am Statistn 38(1):32- 
. 4. 

Johnson D. S., The Np-Completeness Column—An Ongoing Guide. J Algorithm 

4(4):397-411, 1983. 

Logan B. F., Integrals of High-Pass Functions. SIAM J Math 15(2):389-405, 1984. 

Rose D. J., Convergent Regular Splittings for Singular M-Matrices. SIAM J 

Alg 5(1):133-144, 1984. 

Saksena V. R., Cruz J. B., Robust Nash Strategies for a Class of Nonlinear 

Singularly Perturbed Problems. Int J Contr 39(2):293-310, 1984. 

Togai M., Japan’s Next Generation of Robots. Computer 17(3):19-25, 1984. 


Tullis T. S., The Formatting of Alphanumeric Displays—A Review and Anal- 
ysis. Human Fact 25(6):657-682, 1983. 


ENGINEERING 


Abidi A. A., Meyer R. G., Noise in Relaxation Oscillators. IEEE J Soli 18(6):794- 
802, 1983. 

Arroyo C. J., A Helical-Wrap Peel Strength Tester. Adhes Age 27(2):12-15, 1984. 
Aumiller G. D., Infrared Dye-Laser in the 685- to 880-nm Range (Letter). Appl 
Optics 23(5): 651, 1984. 

Banu M., Tsividis Y., Fully Integrated Active RC-Filters in MOS Technol- 
ogy. IEEE J Soli 18(6):644-651, 1983. 

Burr D. J., Ackland B. D., Weste N., Array Configurations for Dynamic Time 
Warping. IEEE Acoust 32(1):119-128, 1984. 

Chraplyvy A. R., Optical Power Limits in Multichannel Wavelength-Division- 
Multiplexed Systems Due to Stimulated Raman Scattering. Electr Lett 
20(2):58-59, 1984. 

Dautremont-Smith W. C., Lopata J., Optical Monitoring for Rate and Uniformity 
Control of Low-Power Plasma-Enhanced CVD. J Vac Sci B Technol 1(4):943- 
946, 1983. 

Ebeling K. J., Coldren L. A., Wavelength Self-Stabilization of Coupled-Cavity 
Semiconductor Lasers. Electr Lett 20(2):69-70, 1984. 

Elvidge J. et al., Microprocessor Implementation of an FFT for Iono- 
spheric VLF Observations (Letter). IEEE Geosci 22(2):171-174, 1984. 

Fichtner W., Nagel L. W., Penumalli B. R., Petersen W. P., Darcy J. L., The Impact 
of Supercomputers on IC Technology Development and Design. P IEEE 
72(1):96-112, 1984. 


1411 


Gehrels N. et al., The Development of a Segmented N-Type Germanium Detec- 
tor, and Its Application to Astronomical Gamma-Ray Spectroscopy. IEEE 
Nucl § 31(1):307-311, 1984. 

Green M. L., Coleman E., Bader F. E., Sproles E. S., The Physical Metallurgy and 
Electrical Contact Resistance of Mechanically Alloyed Cu-Ru Compos- 
ites. Mater Sci E 62(2):231-239, 1984. 

Haber F., Barness Y., Yeh C. C., An Adaptive Interference Canceling Array 
Utilizing Hybrid Techniques. IEEE Aer El 19(6):795-804, 1983. 

Hall P. M., Dudderar T. D., Argyle J. F., Thermal Deformations Observed in 
Leadless Ceramic Chip Carriers Surface Mounted to Printed Wiring 
Boards. IEEE Compon 6(4):544-552, 1983. 

Hayes J. R., Capasso F., Gossard A. C., Malik R. J., Wiegmann W., Bipolar Transistor 
With Graded Band-Gap Base. Electr Lett 19(11):410-411, 1983. 

Henein G., Origin of Blistering Observed on Forming Gas-Annealed Ti-Pt 
Ohmic Contacts. Thin Sol Fi 109(2):115-125, 1983. 

Kastalsky A., Luryi S., Gossard A. C., Hendel R., A Field-Effect Transistor With a 
Negative Differential Resistance. IEEE Elec D 5(2):57-60, 1984. 

Koren U., Arai S., Tien P. K., Three-Channel Buried-Crescent InGaAsP Laser 
With 1.51-um Wavelength on Semi-Insulating InP. Electr Lett 20(4):177-178, 
1984. 

Kramer B. et al., Data Acquisition, Reduction, and Transmission of Atmospheric 
Electricity Data on High-Altitude Scientific Balloons (Letter). IEEE Geosci 
22(2):169-171, 1984. 

Levine B. F., Bethea C. G., Ten-MHz Single Photon Counting at 1.3 um. Appl 
Phys L 44(6):581-582, 1984. 

Mucha J. A., Correction of Nonlinear Derivative Diode-Laser Data in Standard 
Addition Analyses. Appl Spectr 38(1):68-73, 1984. 

Novembre A. E., Bowden M. J., Effect of Varying the Composition of Copolymers 
of Glycidyl Methacrylate and 3-Chlorostyrene (GMC) on Electron Litho- 
graphic Performance. Polym Eng S 23(17):975-979, 1983. 

Olsson N. A., Dutta N. K., Liou K. Y., Dynamic Linewidth of Amplitude-Modu- 
lated Single-Longitudinal-Mode Semiconductor-Lasers Operating At 1.5 um 
Wavelength. Electr Lett 20(3):121-122, 1984. 

Olsson N. A., Dutta N. K., Tsang W. T., Logan R. A., Threshold Current Charac- 
teristics of GaAs Lasers Under Short Pulse Excitation. Electr Lett 20(2): 63- 
64, 1984. 


Ralls K. S., Skocpol W. J., Jackel L. D., Howard R. E., Fetter L. A., Epworth R. W., 
Tennant D. M., Discrete Resistance Switching in Submicrometer Silicon Inver- 
sion Layers: Individual Interface Traps and Low-Frequency (1-/?) 
Noise. Phys Rev L 52(3):228-231, 1984. 

Russ G. J.. A Comparative Study of Electroplated Palladium as a Contact 
Finish. IEEE Compon 6(4):389-395, 1983. 

Sharma S. P., Dasgupta S., Reaction of Contact Materials With Vapors Emanating 
From Connector Products. IEEE Compon 6(4):553-559, 1983. 

Smith P. W., Tomlinson W. J., Nonlinear Optical Interfaces—Switching Behav- 
ior. IEEE J Q El 20(1):30-36, 1984. 

Song B. S., Gray P. R., A Precision Curvature-Compensated CMOS Bandgap 
Reference. JEEE J Soli 18(6):634-643, 1983. 

Sundahl R. C., Electrooptic Materials for Optical Communications. Ferroelectr 
50(1-4):457-466, 1983. 

Taylor G. N., Wolf T. M., Stillwagon L. E., The Role of Inorganic Materials in 
Dry-Processed Resist Technology. Sol St Tech 27(2):145-155, 1984. 

Thurston R. N., Boyd G. D., Brightness and Contrast of Nematic Liquid-Crystal 
Bistable Configuration Displays. Displays 5(1):15-20, 1984. 

Vasile M. J., The Velocity Dependence of Secondary Ion Yields. Nucl Instru 
218(1-3):319-323, 1983. 

Wagner A., Applications of Focused Ion Beams. Nucl Instru 218(1-3):355-362, 
1983. 

Winters J. H., Differential Detection With Intersymbol Interference and Fre- 
quency Uncertainty. [EEE Commun 32(1):25-33, 1984. 


1412 TECHNICAL JOURNAL, SEPTEMBER 1984 


Xydeas C. S., Kostic B., Steele R., Embedding Data Into Pictures by Modulo 
Masking. IEEE Commun 32(1):56-69, 1984. 


MANAGEMENT/ECONOMICS 


Graham D. R., Kaplan D. P., Sibley D. S., Efficiency and Competition in the Airline 
Industry. Bell J Econ 14(1):118-138, 1983. 

Porter R. H., A Study of Cartel Stability—The Joint Executive Committee, 
1880-1886. Bell J Econ 14(2):301-314, 1983. 


PHYSICAL SCIENCES 


Abrahams S. C., Bair H. E., DiSalvo F. J., Marsh P., Deuring L. A., Magnetic and 
Structural Phase-Transition at 363.7-« in Tetramethylammonium Hexacy- 
anotrimethylenecyc-lopropanide. Phys Rev B 29(3):1258-1262, 1984. 

Abrahanis S. C., Marsh P., Liminga R., Lundgren J. O., Pyroelectric Sr(NOz2)2-H20: 
Room Temperature Crystal Structure. J Chem Phys 79(12):6237-6241, 1983. 
Aeppli G. et al., Spin Correlations and Reentrant Spin-Glass Behavior in Amor- 
phous Fe-Mn Alloys. 2. Dynamics. Phys Rev B 29(5):2589-2605, 1984. 

Aeppli G., Guggenheim H., Uemura Y. J., Spin Dynamics Near the Magnetic 
Percolation Threshold. Phys Rev L 52(11):942-945, 1984. 

Aur S., Kofalt D., Waseda Y., Egami T., Chen H. S., Teo B. K., Wang R., Local 
Atomic-Structure in Amorphous MosoNiso by Resonance X-Ray Diffraction 
Using Synchrotron Radiation. J Non-Cryst 61-2(Jan):331-336, 1984. 

Benedek G., Toennies J. P., Doak R. B., Surface-Phonon Spectroscopy of LiF(001) 
by Inelastic Scattering of He Atoms: Theory and Interpretation of Time-of- 
Flight Spectra. Phys Rev B Condensed Matter 28(12):7277-7287, 1983. 

Berreman D. W., Domain-Wall Tension Allows Bistability in Imperfect Laminar 
Cholesteric Twist Cells. J Appl Phys 55(4):806-809, 1984. 

Bertz S. H., The Role of Symmetry in Synthetic Analysis—The Concept of 
Reflexivity. J Chem S Ch (4):218-219, 1984. 

Besomi P., Degani J., Wilson R. B., Heat-Treatment Effects on Indium Gallium- 
Arsenide Phosphide Double Heterostructure Material. J Appl Phys 55(4):1135- 
1138, 1984. 

Broek H. W., Tom C., Spatial Coherence of Multipath Underwater Acoustic 
Transmission—An Experiment At Two Frequencies Concurrently. J Acoust 
So 75(2):395-405, 1984. 

Bruinsma R., Aeppli G., Metamagnets and Frustration. Phys Rev B 29(5):2644- 
2651, 1984. 

Capizzi M., Modesti S., Frova A., Staehli J. L., Guzzi M., Logan R. A., Electron-Hole 
Plasma in Direct-Gap Ga,-,Al,As and K-Selection Rule. Phys Rev B 
29(4):2028-2035, 1984. 

Chen H. S., Inoue A., Sub-Tg Enthalpy Relaxation In PdNiSi Alloy Glasses. J 
Non-Cryst 61-2(Jan):805-810, 1984. 

Coldren L. A., Ebeling K. J., Rentschler J. A., Burrus C. A., Wilt D. P., Continuous 
Operation of Monolithic Dynamic Single-Mode Coupled-Cavity Lasers. Appl 
Phys L 44(4):368-370, 1984. 

Dean P. J., Thomas D. G., Frosch C. J., New Isoelectronic Trap Luminescence in 
Gallium-Phosphide. J Phys C 17(4):747-762, 1984. 

Denbroeder F. J., Vandenberg J. M., Draper C. W., Microstructures of Cu-Zr Phases 
Formed by Laser Surface Treatment. Thin Sol Fi 111(1):43-51, 1984. 

Di Mauro L. F., Heaven M., Miller T. A., Laser-Induced Fluorescence Spectroscopy 
of Ionic Clusters Between Organic Cations and Inert Gases. Chem P Lett 
104(6):526-532, 1984. 

Dinnsen D. A. et al., Phonological Neutralization, Phonetic Implementation and 
Individual Differences. J Phonetics 12(1):49-60, 1984. 

Donnelly V. M., Flamm D. L., Dautremont-Smith W. C., Werder D. J., Anisotropic 


PAPERS BY AT&T BELL LABORATORIES AUTHORS “1413 


Etching of SiOz In Low-Frequency CF,4/Oz2 and NF3/Ar Plasmas. J Appl Phys 
55(1):242-252, 1984. 

Dutta N. K., Wilt D. P., Besomi P., Dautremont-Smith W. C., Wright P. D., Nelson 
R. J., Improved Linearity and Kink Criteria for 1.3-u4m InGaAsP-InP Chan- 
neled Substrate Buried Heterostructure Lasers. Appl Phys L 44(5):483-485, 
1984, 

Eergen J. R., Julesz B., Rapid Discrimination of Visual Patterns. JEEE Syst M 
13(5):857-863, 1983. 

Eibschutz M., Lines M. E., Chen H. S., Linewidth Asymmetries in the Mossbauer 
Zeeman Spectrum of Amorphous Iron-Metalloid Alloys. J Non-Cryst 61-2 
(Jan):1219-1224, 1984. 

Elman B. S. et al., Stoichiometric Determination of SbCl;-Graphite Intercalation 
Compounds Using Rutherford Backscattering Spectrometry. J Appl Phys 
55(4):894-898, 1984. 

Fleury P. A., Lyons K. B., Dynamic Central Peaks in Solids—Progress and 
Prospects. Ferroelectr 52(1-3):3-14, 1983. 


Gottscho R. A., Miller T. A., Optical Techniques in Plasma Diagnostics. Pur A 
Chem 56(2):189-208, 1984. 

Haavasoja T., Narayanamurti V., Chin M. A., Bhatt R. N., Observation of a High- 
Frequency Cutoff for Phonon Propagation in Liquid “He. Phys Rev L 
51(26):2400-2409, 1983. 


Harbison J. P., Williams A. J., Lang D. V., Effect of Silane Dilution on Intrinsic 
Stress in Glow-Discharge Hydrogenated Amorphous-Silicon Films. J Appl 
Phys 55(4):946-951, 1984. 

Holmes R. J., Kim Y. S., Brandle C. D., Smyth D. M., Evaluation of Crystals of 
LiNbO; Doped With MgO or TiC2 for Electrooptic Devices. Ferroelectr 51(1- 
2):41-45, 1983. 

Hopkins J. B., Farrow L. A., Fisanick G. J.,. Raman Microprobe Determination of 
poi hy edi Orientation in Laser Annealed Silicon. Appl Phys L 44(5):535- 
537, 1984. 

Huse D. A., Multicritical Scaling in Baxter’s Hard Square Lattice Gas. J Phys A 
16(18):4357-4368, 1983. 

Jackel J. L., Rice C. E., Veselka J. J.. Proton Exchange in LiNbOs3. Ferroelectr 
50(1-4):491-496, 1983. 

Johnson A. M., Glass A. M., Olson D. H., Simpson W. M., Harbison J. P., High 
Quantum Efficiency Amorphous-Silicon Photodetectors With Picosecond Re- 
sponse Times. Appl Phys L 44(4):450-452, 1984. 

Julesz B., Harmon L. D., Noise and Recognizability of Coarse Quantized Images 
(Letter). Nature 308(5955):211-212, 1984. 

Kahn D., Rabiner L. R., Rosenberg A. E.,. On Duration and Smoothing Rules in a 
Demisyllable-Based Isolated-Word Recognition System. J Acoust So 75(2):590- 
598, 1984. 

Keith H. D., Loomis T. C., Estimation of Crystallization Temperatures in 
Quenched Polyethylene. J Pol Sc PP 22(2):295-306, 1984. 

Kihlstrom K. E., Mael D., Geballe T. H., Tunneling a?F(w) and Heat-Capacity 
Measurements in High-T, NbsGe. Phys Rev B Condensed Matter 29(1):150-158, 
1984. 

Levine B. F., Bethea C. G., Single Photon Detection at 1.3 um Using a Gated 
Avalanche Photodiode. Appl Phys L 44(5):553-555, 1984. 

Liu P. L. et al., Measurements of Intensity Fluctuations of an InGaAsP External 
Cavity Laser. Appl Phys L 44(5):481-483, 1984. 

Lovinger A. J. et al., Curie Transitions In Copolymers of Vinylidene Fluo- 
ride. Ferroelectr 50(1-4):553-562, 1983. 

Macrander A. T., Chu S. N. G., Strege K. E., Bloemeke A. F., Johnston W. D., 
Correlation Between Background Carrier ‘Concentration and X- Ray Line- 
width for InGaAs/InP Grown by Vapor-Phase Epitaxy. Appl Phys L 44(6). 615- 
617, 1984. 

Maloney T. J., Aspnes D. E., Arwin H., Sigmon T. W., Spectroscopic Ellipsometric 
and He Backscattering Analyses of Crystalline Si-Si0O2 Mixtures Grown by 
Molecular-Beam Epitaxy. Appl Phys L 44(5):517-519, 1984. 


1414 TECHNICAL JOURNAL, SEPTEMBER 1984 


Mankiewich P. M., Craighead H. G., Harrison T. R., Dayem A. H., High-Resolution 
Electron-Beam Lithography on CaF». Appl Phys L tA(4): 468-469, 1984, 
Martinez O. E., Heritage J. P., Miller B. I., Dutta N. K., Nelson R. J., Threshold 
Temperature Dependence of Subnanosecond Optically-Excited 1.3-ym In- 
GaAsP Lasers. App! Phys L 44(6):578-580, 1984. 

Miller R. C., Dupuis R. D., Petroff P. M., High-Quality Single GaAs Quantum 
Wells Grown by Metalorganic Chemical Vapor Deposition. Appl Phys L 
44(5):508-510, 1984. 

Mitchell J. W., Kessler J. E., Purification of Optical Waveguide Glass Forming 
Reagents—Phosphorus Oxychloride. J Elchem Sc 131(2):361-365, 1984. 

Moller A., Nordheim A., Kozlowski S. A., Patel D. J., Rich A., Bromination Stabilizes 
Poly(dG-dC) in the Z-DNA Form Under Low-Salt Conditions. Biochem 
23(1):54-62, 1984. 

Ng K. K., Polito W. J., Ligenza J. R., Growth-Kinetics of Thin Silicon Dioxide in 
a Controlled Ambient Oxidation System. Appl Phys L 44(6):626-628, 1984. 
Olsson N. A., Tsang W. T., Logan R. A., Kaminow I. P., Ko J. S., Spectral Bistability 
in Coupled Cavity Semiconductor Lasers. Appl Phys L 44(4):375-377, 1984. 
Pelleg J., Diffusion of Phosphorus in TaSiz Thin-Films. 1. Lattice and Grain- 
Boundary Diffusion in TaSi2/Si(Polycrystalline). Thin Sol Fi 110(2):115-127, 
1983. 

Pelieg J., Diffusion of Phosphorus in TaSi, Thin Films. 2. Lattice and Short- 
Circuit Diffusion in TaSi./SiOz. Thin Sol Fi 110(2):129-138, 1983. 

Phillips J. C., Chemical Bonding and Heats of Formation in Chalcogenide 
Network Compounds As.(S,Se)3 and GE(S,Se)2. Phys Rev B Condensed Matter 
28(12):7038-7039, 1983. 

Pini R. et al., Continuously Tunable Multiple-Order Stimulated 4-Photon Mix- 
ing in a Multimode Silica Fiber. Optics Lett 9(3):79-81, 1984. 

Raghavan R. S., Growing Up Recoil-Free in India. Phys Today 37(2):38-44, 1984. 
Rossetti R., Beck S. M., Brus L. E., Direct Observation of Charge-Transfer 
Reactions Across Semiconductor—Aqueous-Solution Interfaces Using Tran- 
sient Raman Spectroscopy. J Am Chem S 106(4):980-984, 1984. 

Schweizer K. S., Stillinger F. H., High-Pressure Phase-Transitions and Hydrogen- 
Bond Symmetry in Ice Polymorphs. J Chem Phys 80(3):1230-1240, 1984. 

Silberg E., Chang T. Y., Ballman A. A., Caridi E. A.. Doping and Electrical Prop- 
erties of Mn in In;.,., Ga,Al, as Grown by Molecular Beam Epitaxy. J Appl 
Phys 54(12):6974-6981, 1983. 

Stavola M., Infrared Spectrum of Interstitial Oxygen in Silicon. Appl Phys L 
44(5):514-— 516, 1984. 

Talpey T. E., Worley R. D., Infrasonic Ambient Noise Measurements in Deep 
Atlantic Water (Letter). J Acoust So 75(2):621-622, 1984. 

Tarng S. S. et al., Use of a Diode Laser to Observe Room-Temperature, Low- 
Saas Optical Bistability in a GaAs-AlGaAs Etalon. Appl Phys L 44(4):360- 
361, 1984. 

Tell B. et al., Beryllium Implantation Doping of InGaAs. Appl Phys L 44(4):438- 
440, 1984. 

Temkin H., Dupuis R. D., Logan R. A., Van Der Ziel J. P., Schottky-Barrier Restricted 
Arrays of Phase-Coupled AlGaAs Quantum-Well Lasers. Appl Phys L 
44(5):473-475, 1984. 

Teo B. K., Chen H. S., Wang R., Antonio M. R., EXAFS of Glassy Metallic Alloys: 
Amorphous and Crystalline MoNi. J Non-Cryst Solids 58(2-3):249-274, 1983. 
Van Der Ziel J. P., Temkin H., Dupuis E. D., Mikulyak R. M., Mode-Locked 
Picosecond Pulse Generation From High-Power Phase-Locked GaAs-Laser 
Arrays. Appl Phys L 44(4):357-359, 1984. 

Warren W. W., Elhanany U., Brennert G. F.. NMR Studies of Expanded Liquid 
Cesium. J Non-Cryst 61-2(Jan):23-28, 1983. 

Weston H. T., Daniels W. B., Temperature and Volume Dependence of the 
Thermal-Conductivity of Solid Neon. Phys Rev B 29(5):2709-2716, 1984. 

Wood T. H., Actual Modal Power Distributions in Multimode Optical Fibers 
and Their Effect on Modal Noise. Optics Lett 9(3):102-104, 1984. 


PAPERS BY AT&T BELL LABORATORIES AUTHORS 1415 


SOCIAL AND LIFE SCIENCES 


Poltrock S. E., Schwartz D. R., Comparative Judgments of Multidigit Numbers. J 
Exp Psy L 10(1):32-45, 1984. 


ERRATUM 


A. G. Fraser and S. P. Morgan, “Queueing and Framing Disciplines 
for a Mixture of Data Traffic Types,” AT&T Bell Lab. Tech. J., 63, 
No. 6, Part 2 (July-August 1984), pp. 1061-87. 

The illustrations in Figures 6, 7, and 8 in this paper were misplaced. 
In all cases the captions are correct, but the illustrations are in the 
wrong positions. The illustration captioned Figure 6 (page 1074) should 
be Figure 7, Figure 7 (page 1075) should be Figure 8, and Figure 8 
(page 1077) should be Figure 6. 
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